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Preface 


T PRESENT BOOK comprises a 
collection of hitherto unpublished papers which were read in Baton 
Rouge at a Louisiana State University Psychology Symposium, Feb- 
ruary 1958, the aim of which was to present the most recent concepts 
and developments in objective personality assessment. Accordingly, 
nationally known experts from virtually every section of the country 
were invited to read papers concerned with their current thinking 
about problems of objective personality measurement, together with 
a description of some of their present research and its background. 

The final result is considered to be an accurate representation of 
what leaders in the field of objective personality testing are now 
doing. This volume tells what many of the personality tests of 
tomorrow will look like and what the rationale behind them will be. 
There was, of course, no general agreement about personality meas- 
urement among the authors. About the only point of real agreement 
was that personality testing should be objective. Even here, the 
historian of the symposium, Robert I. Watson, a clinician of note 
though not a dedicated objective measurement psychologist, was 
moved to protest when the clinical usefulness of the Rorschach and 
other projective devices was seriously questioned during one of the 
discussion periods. i 

Each of the contributors has developed a definite approach to 
measuring one or more facets of personality; and he acknowledges, 
at least by implication, a serene confidence in his methodology. He 
sees room for improvement, of course, in what he is doing and he is 
willing to discuss it. He sees even greater room for improvement in 
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what his colleagues are doing, at least as they revealed their pro- 
ional activities at the s osium. 

en. to ae in print the total atmosphere 7 = 
symposium where oral exchanges are many, where eee a ay 
pressions often convey as much meaning as the words used. 5 i 
one were to describe the general flavor of the meetings, it would be 
that one distinctly felt that the speakers knew what they were doing 
and that they enjoyed doing it. This was reflected in the veiling ess 
almost eagerness of each speaker, to reply to all questions raised dy 
members of a sizeable audience and to answer without cavil. a 

Thus far, the present editors have been referring to the m 
speakers who assembled to present their papers. Each of ie 
speakers, it is true, exerted some form of catalytic effect upon the 
others; yet there was more. There was the audience, of course, and 
there were several people who were invited because the editors, in 
planning the symposium, believed these persons would mul 
directly to the atmosphere of scholarly enthusiasm. There was, "1 
instance, H. Max Houtchens of the Veterans Administration eae 
Office in Washington. He has a trick of quietly asking some of t a 
most pointed questions in the kindliest way imaginable and serves, 
thereby, as a first-order clarifier of cloudy issues. The 

Out of the informal interchange several points appeared. be 
speakers had nothing against projective tests as such but they ee 
objective tests better, at least, as they conceived of them. As may 
be noted in the first chapter, the conceptions of objective vary Ds 
what, even among a group of experts such as the authors of ee 
book. But objective in their fashion, the speakers at the er ; 
liked objective tests for reasons which may be summarized as 10 
lows: 

l. They are usually easier to administer 
training of test examiners). 


2. They are more easily evaluated. It is usually much easier tO 
gauge the reliability and validity of objective than subjective tests. 
It is thus easier to discover the errors in one’s measurements which 
is less often true in the case of subjective assessment. 

3. Objective testing is more likely to contribute to constructing ĉ 
body of theory and generalization about human behavior. If there i$ 
to be a true “science of personality,” a body of integrated constructs 
anchored by operational definitions to observables, the observables 
will have to be measured objectively, 

_ The present Symposium is part of the graduate training program 
in psychology at Louisiana State University. It would have bee? 


(often requiring little 
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much smaller in scope were it not for the broad professional vision 
of the administrators of the Louisiana State Department of Hospi- 
tals. This state agency supplies funds for training and research in 
psychiatry, psychiatric social work, psychiatric nursing, and clinical 
psychology, with provision for visiting lecturers being typically in- 
cluded in such grants. This is one product of the firm conviction of 
the Department of Hospital administrators that those who serve 
the State should have the best training possible. Therefore it is a 
pleasure to thank Mr. Jesse H. Bankston, Director of the Department 
of Hospitals, and his program directors, Mr. Winborn E. Davis and 
Mr. E. R. Rogillio, for their encouragement and for their uncom- 
promising stand on quality in professional training. The present 
symposium is but one product of their far-sightedness. 

Manuscript preparation is always a chore and sometimes irksome. 
The present task was less irksome than usual because Mr. Arthur 
Kaufman checked the references, while Mrs. Vera Foil and Mrs. 
Floy Brown did the final typing of the manuscript. Mrs. Sylvia Berg 
is to be thanked heartily for her work with the Author Index; Mrs. 
June T. Bradford for her completion of the Subject Index. 


BERNARD M. Bass 
Irwin A. BERG 
Baton Rouge, Louisiana 
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Historical Review of Objective 
Personality Testing: The Search 
for Objectivity 


Roserr I. Watson 
Northwestern University 


T NOTED philosopher of science, 
Herbert Feigl, considers a major standard of science to be inter- 
subjective testability. Concerning intersubjective testability he 
writes: 


This is only a more adequate formulation of what is generally meant by the 
objectivity” of science. What is here involved is not only the freedom from 
personal or cultural bias or partiality, but—even more fundamentally—the re- 
quirement that the knowledge claims of science be in principle capable of test 
(confirmation or disconfirmation, at least indirectly and to some degree) on the 
part of any person properly equipped with intelligence and the technical de- 
vices of observation or experimentation. The term intersubjective stresses the 
social nature of the scientific enterprise. If there be any “truths” that are acces- 
sible only to privileged individuals, such as mystics or visionaries—that is, 
knowledge-claims which by their very nature cannot independently be checked 
by anyone else—then such “truths” are not of the kind that we seek in the 
sciences. The criterion of intersubjective testability thus delimits the scientific 
from the nonscientific activities of man. (15, p. 11). 


I would add that objectivity is a goal of science, not a prerequisite 
for scientific endeavors. Objectivity is not absolute, but relative. It 
is not unusual in science for basic phenomena to be first described 
in a qualitative way. Objective methods emerge only upon more 
intensive study. Our efforts are in the direction of increasing ob- 
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jectivity whenever possible, but this does not mean that we can 
neglect problems simply because they are not yet objective. At 
least some of us select problems which we feel are capable of being 
rendered more objective and make our research task this search for 
increasing objectivity. For example, in Chapter 9 by Hunt, we 
shall find that a demonstration of the reliability of clinical judgment 
is a means whereby the clinician, himself, becomes a more objective 
instrument. In the present chapter, a major theme is the search for 
` objectivity in personality testing. 

Of necessity, my topic must be considered in a somewhat nar- 
rower framework than the entire scope of personality theory. A 
major omission in the consideration of objectivity in personality 
evaluation is the argument as to why one should go beyond the 
objective approach, as advanced by philosophical and phenomeno- 
logical characterologists. This, otherwise, serious omission is tem- 
pered somewhat by the fact that the characterologists have not been 
particularly interested in psychological testing, despite the fact that 


many projective tests can be shown to be interpretable on phe- 
nomenological principles. 


In order to place objective person 
spective, it is necessary to say some 


logical testing in general. We cannot ignore early mental testing in 
our search for the beginnings of personality testing, for to do so 
would be to ignore a truism of historical research—that the begin- 
nings of attention to a topic may not be referred to in the same 
manner as it is referred to in later years. Personality, as our various 
conceptions now regard it, was not a systematic rubric in the earlier 
psychological traditions. Lack of specific reference, however, does 
not prevent us from seei g in the perspective of the present-day, 
some of the aspects of what we now call personality, So we begin 


narrowly een By ng, not the history of personality testing, 


ality testing in historical per- 
thing about objective psycho- 


adopted by Whipple in l Mg ama to a 
Physical Tests (87). 


ts served the purpose of determin- 
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ing and measuring some phase of mental capacity or trait. I would 
like to add parenthetically that even in 1910 he could plead that 
what was needed was not new tests, but an exhaustive investigation 
of those already available. But to return to his classification of 
mental tests, the major heading (disregarding the anthropometric) 
were physical and motor capacity, sensory capacity, attention and 
perception, description and report, association learning and memory, 
suggestibility, imagination and invention, and intelligence. Note, 
there is no mention of personality. However, the tests of suggesti- 
bility and imagination and invention could be called personality 
tests in today’s perspective. It would also be possible to include 
description and report in the scope of personality. Note also that 
tests for emotion were not mentioned, for measures of emotion came 
later. Many of the personality questionnaires of the twenties were 
called measures of emotionality. So, too, were more objective 
efforts, such as the X-O, or cross-out tests of Pressey, in which the 
number of words found to be unpleasant was the affectivity score. 
With this justification of what is included in the discussion to 
follow, we may now turn to the history of mental tests. 

Sir Francis Galton shares with James McKeen Cattell, the 
founding of psychological testing. As early as 1882, Galton had 
established a small laboratory in London where, for a small fee, 
individuals could take a series of physical measurements and tests 
of reaction time and sensory acuity. (One might ask in passing 
whether this payment meant that he was the first psychological 
practitioner.) The very fact that he thought people would be in- 
terested in their standing on these measures shows their test-orienta- 
tion, 

Galton was primarily interested in no less than an inventory of 
human abilities. He related these to his evolutionary views and to 
his studies of inheritance, but the fact remains, that he conceived 
of his various measures as tapping as broad a spectrum of psycho- 
logical characteristics as was possible. If the term, personality, had 
been used as it is now, I believe he would not have hesitated to use 
it to describe some of his efforts. 

Ina paper published in 1890, in which he coined the term, mental 
tests, Cattell proposed a standard series of tests to be applied for 

discovering the constancy of mental processes, their interdepend- 
ence, and their variation under different circumstances” (6, p. 373). 
He offered both a select list of ten tests then being used in the Psy- 
chological Laboratory of the University of Pennsylvania, and a 
onger list of 50 others proposed for further consideration. The ten 
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tests were dynamometer pressure, rate of movement, two-point 
threshold, pain sensitivity, least noticeable difference in weight, 
reaction time for sound, time for naming colors, bisection of a 
50 cm. line, judgment of ten seconds time, and those numbers of 
letters repeated on once hearing. The list of 50 was essentially 
similar. The fact that Galton (6) contributed a number of com- 
ments at the end of this article gives unequivocal evidence of the 
connection between Galton’s interest in individual differences and 
the mental-test movement. 

Earlier, in 1883, with his already formed interests in individual 
differences and in reaction time as a measure of intelligence, Cattell 
had gone to Wundt’s laboratory. Here he had completed his doc- 
toral dissertation on his own problem of individual differences in 
reaction times, which Wundt, it might be added, had viewed dubi- 


sity of Pennsylvania, he 
where he continued his testing pro- 
ame battery of tests. After several years’ 
a monograph by Clark Wissler (39) ap- 
the findings. Correlation between results 
d academic class standing was negligible. 
no more intercorrelated among themselves 


other students of 
matters with labor 


evices were single tests, not 
The now usual 
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United States was met by negative results and this particular line 
of development lapsed, in what may be, for convenience, referred 


to as the laboratory period in psychological testing. 


EARLY LACK OF CONCERN FOR OBJECTIVE TESTING 


It is pertinent to pause to consider psychologists’ views in relation 
to this search for objectivity. It is probable that the question of 
objectivity did not concern them because of the origins of test 
materials in the laboratory. Reaction time devices measure reaction 
time; learning nonsense syllables is learning. The process measured 
was defined by the material, just as in the laboratory today one does 
not ask, “Are we really measuring learning?” when serial learning 
lists are exposed or maze paths are threaded. These measures had 
what we now would call content validity. Content validity is the 
degree to which the test samples the universe of content specified, 
as in an achievement test and in the usual measures for the ex- 
periments in learning. The step from measuring reaction time to 
using it for the measurement of intelligence “because intelligence 
calls for speedy reaction,” seemed plausible but of no great theo- 
retical moment. It was not then seen that it was a great leap from 
observed behavior to construct. 

By and large, the question of objectivity was not verbalized dur- 
ing these years. Psychology, after all, was still the study of mental 
structures or functions, and introspection the method for advance- 
ment of psychological knowledge. For example, Whipple, (37) in 
his 1910 authoritative and widely used test manual, does not men- 
tion objectivity. He does, however, speak of standardization of 
conditions, which is conducive to objectivity. Familiarity with 
instructions and their clarity were also stressed. General knowledge 
of the literature and an inspection of some of the textbooks of the 
period, however, did not reveal discussion of objectivity as a topic. 
After all, psychologists could not use “objective” in referring to a 
subjective science, But this does not mean they were unaware of 
the problem. In fact, the centuries-old question of the personal 
equation which, years later, Wundt’s students and others investi- 
gated, is a recognition of precisely this point. So, too, is the psy- 
chologist’s fallacy of James, the confusion of the personal standpoint 
with the mental facts. Training of introspectors served the same 
function of increasing what we could call objectivity. 
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INTELLIGENCE TESTING 


It is tempting, but not particularly germane to the major re 
turn now to the work of Binet who, after working with simila i 
sensory and motor tests with similarly unproductive results, a 
tually found in more complex or higher mental functions sla oe i 
to measure intelligence. But this is part of the history of inte igenc 
testing of 1900-1920. In the United States this history of intelligence 
testing was not closely bound to personality testing, for various 
reasons. Interest in and research on intelligence testing were 
directly related to Binet’s efforts but the others who came after i 
did not continue his systematic analytic interests. Those who fol- 
lowed Binet were pragmatic and interested in the application = 
intelligence tests to social matters, such as mental retardation, schoo! 
placement, and the like. But they were so absorbed with their 
instruments of measurement that they were not very 
ested in problems beyond these instruments. f 

As the well-known definition would have it, psychologists of that 
day, and for some years to come, tended to consider intelligence tobe 
whatever intelligence tests measured. So, in this sense, interest was 
in intelligence as a global concept. And yet, what Spearman callec 
the anarchic theory of mental structure, a theory of extreme specifi- 
city of mental structure and function, was the prevailing view. he 
studies of William James, Thorndike, and others on transfer of train” 
ing had fostered the view that abilities were highly specific. The 
results of sensory-motor testing, described earlier, had much the 
same effect. Thus, we had the practical pragmatic interests in 
intelligence testing, on the one hand, and, on the other, even larger 


segments of the psychological field in which abilities and traits were 
viewed as highly specific. 


much inter- 


This lack of relevance of developments in the specific areas of 
intelligence testing, during these years in the United States, curl 
ously enough, does not seem to have a counterpart in Britain. In 4 
sense, the British psychologists continued more closely the traditio? 
of the laboratory that has been described. In part, this was due © 
the impetus of Galton and a continued stress i the part © his 
students on individual difference. In part, it was due to the statis- 
tical advances in England, first under Calton and Pearson, and la er 
under Spearman and Burt. ; 

Many of the tests used in Britain at the turn of the century were 
the logical derivatives of earlier sensory and motor tests, but jey 
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also included as measures, association tests, such as retention meas- 
ures, target aiming, card-sorting, and the like. 

In an important paper published in 1904, Spearman (80) criticized 
the previous methodological efforts on statistical grounds. For ex- 
ample, many earlier workers had failed to use quantitatively precise 
statements of the degree of correlation between tests, they did not 
calculate the probable crror, and they did not allow for errors of 
observation. In addition, based upon his correlation between sen- 
sory tests and estimates of intelligence, Spearman arrived at the 
conclusion that “all branches of intellectual activity have in com- 
mon one fundamental function . . . (30, p. 284). Thus was launched 
the beginnings of the thinking from which, a few years later, came 
factor analysis. The British psychologists saw more clearly than 
their American contemporaries the reasons that early attempts at 
testing had failed. 

Above all, they had something positive and challenging to work 
with in factor analysis. Their general associationist background was 
conducive to continuing this tradition. They continued an interest 
in these measures, gradually including more and more material 
relevant to the higher mental processes. 

Factor analysis is a tool, by the very nature of which you cannot 
in advance tell what factors will emerge. True, the material was so 
selected as to get at intellectual function, but the nature of the 
technique required an analytic attitude. Nor were nonintellectual 
factors entirely neglected. The pioneering factor analytic study of 
Webb (86) in 1915 was based on ratings and yielded a factor which 
seemed to be strength of character or will, called w. Burt (4), the 
Same year, briefly reported on the interrelation of ratings of emo- 
tions. But now we must return to developments taking place in the 
United States in 1910's and 1920's. 


BEHAVIORISM AND OBJECTIVITY OF MEASUREMENT 


The appearance of Behaviorism, with its militant espousal of an 
objective approach, had a profound effect on psychological thinking. 
For our purposes, it may be dated by the appearance of the work 
of John B. Watson, beginning in 1913 with his articles and culminat- 
ing in his 1919 publication, Psychology from the Standpoint of a 

ehaviorist. Mentalistic terms, including, “subjective” became 
epithets. The Russian reflexology, which came into being in the 
‘mediately preceding years, is sometimes referred to as, “Objective 
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Psychology,” after the book by that name published in 1910 by 
Bekhterev. 

Psychologists then found in objectivity a standard of science. No 
longer did they have to struggle with mediate and immediate ex- 
perience, dependent and independent experience, and the differ- 
ences between the objective science of physics, and the subjective 
science of psychology. They could then tse “objectivity” proudly, 
as we do to this very day. In the recently received copy of the 
supplement to the Psychological Review, there was a “Glossary of 
Some Terms Used in the Objective Science of Behavior,” by Ver- 
planck (34), who did not even find it necessary to define objective 
among the many, many terms covered, 

The spirit of the times or the Zeitgeist, a term popularized by 
E. G. Boring, had prepared the way for the appearance of an in- 
terest in performance tests. An interesting example is the Will- 
Temperament Test, a behavior measure, which fitted in with the 
times. In 1919, June Downey (11) introduced a test for the measure- 
ment of what she called will-temperament. Its nature was in- 
triguing, consisting largely as it did of handwriting samples under 
different conditions and, thus, behavioral in nature. A sample © 
writing was obtained at “ordinary” speed (for a baseline), as rapidly 
as possible (to get a comparison with ordinary speed on the theory 
that those writing much slower than they can are subject to a loac 
or inhibition), in a different style (to measure flexibility), as slowly 
as possible (to measure motor inhibition or control) and so on. The 
test appealed to the desire for objectivity; so it wes met with en- 
thusiasm. It was given trial after trial until about fifty studies were 
performed, despite almost uniformly negative results from the be- 
ginning, It was as if such a behavioral test as this could not fail to 


work, just because it was a behavior k is als 
: as a tavioral approach. The test is also 
Important as the first i 


major performance measure of personality: 
dean ma performance measures of Pey was 
conduct.” This ra fall cy Voelker (35) of “moral reactions tO 
Thgnivy, with aka o owed in 1923 by the Character Education 
3 1 we associate the names of Hartshorne and May, 


and which produced 
well-known perf r a 
trustworthiness, helpfulness, See panting eee 


of these tests is too well a, and persistence. The 
of the low correlation betwee r pause over them. Their finding 
centuate an era of consid ee specific measures helped to ac- 
of personality, or that 5 Sei le skepticism about tests as measures 
Behaviorism, itself, y pearman referred to as the anarchic view: 
m, itself, with its emphasis upon S-R bonds and per- 
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sonality as a bundle of habits, had helped to bring about this 
skepticism concerning tests on the part of many psychologists. The 
results of Hartshorne and May, even though behavioral, however, 
were another invitation akin to that furnished by the earlier results 
of Wissler and Sharp, to see testing in a skeptical light. 

Performance tests are logically and chronologically related to 
assessment procedures of the miniature life situation sort. They, too, 
have a long past despite their short history. The earliest statement 
of the potentials of this method is probably that of Galton (19). In 
1884 he wrote: 


Emergencies need not be waited for, they can be extemporized; traps, as it 
were, can be laid. Thus, a great ruler whose word can make or mar a subject’s 
fortune, wants a secret agent and tests his character during a single interview. 
He contrives by a few minutes’ questioning, temptation, and show of dis- 
pleasure, to turn his character inside out, exciting in turns his hopes, fear, zeal, 
loyalty, ambition, and so forth. Ordinary observers who stand on a far lower 
pedestal, cannot hope to excite the same tension and outburst of feeling in 
those whom they examine, but they can obtain good data in a more leisurely 
way. If they are unable to note a man’s conduct under great trials for want of 
opportunity, they may do it in small ones, and it is well that those small occa- 
sions should be such as are of frequent occurrence, that the statistics of men’s 
conduct under like conditions may be compared. After fixing upon some par- 
ticular class of persons of similar age, sex, and social conditions, we have to 
find out what common incidents in their lives are most apt to make them betray 
their character. We may then take note as often as we can, of what they do 
on these occasions, so as to arrive at their statistics of conduct in a limited 


number of well-defined small trials (30, p. 182). 
He goes on to offer specific suggestions, such as the following: 


The poetical metaphors of ordinary language suggest many possibilities of 
measurement. Thus when two persons have an “inclination” to one another, 
they visibly incline or slope together when sitting side by side, as at a dinner- 
table, and they then throw the stress of their weights on the near legs of their 
chairs. It does not require much ingenuity to arrange a pressure gauge with 
an index and dial to indicate changes in stress, but it is difficult to devise an 
arrangement that shall fulfill the threefold condition of being effective, not 
attracting notice, and being applicable to ordinary furniture. I made some rude 
experiments, but being busy with other matters, have not carried them on, as 
Thad hoped (30, p. 184). 


b In view of the date in which this was published, 1884, it gale 
e possible to argue that this was the first proposal for an objective 
Personality measure. 
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MODERN ASSESSMENT METHODS 


Modern assessment procedures, as in the stress interview, tee 
procedures, and the Michigan VA trainee study, are appa wd 
moving out of a period in which they were enthusiastically a P a 
and tried, to a period of skepticism about them. If there is to 
period of synthesis, it is too early to predict its 


Personality Questionnaire % foi 
I shall now turn to personality questionnaires in this search 
objectivity, Duri 


> ta 
ng World War I, Woodworth’s Personal Da 
Sheet, or, as it was later called, the Ps 


developed. It contained 116 
symptoms of neurotic patients 
called for and wer 


nature. 


. as 
ychoneurotic Inventory, W° 


velopment and application of 
lane (22), in a criti 


mental measurements and er 
rt questionnaires or in saute 
rcially available personality Loe 
peared, A tremendous number of psychology 

sent writer) helped to develop these get-kno 
edge-quick devices 


l ir 
: estion that personality questionnaires had in 
period of enthusiastic acceptance. Their appeal was to be founc 
their partial objectivity, 


i inde- 
y; a score could be derived on which in 


ich 
factor, -however, to whi es 
iven. These questionnail 
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are subjective in that they require interpretation of the meaning of 
the questions asked by the tester. 
_ Interpretive subjectivity for the person taking them is rampant 
in most personality questionnaires. Consider an early study of 
Benton (3). He interviewed subjects after completion of question- 
naire items, as to what they thought was meant by the items. He 
found, for example, that the item, “Do you take pride in your physical 
appearance?” was answered as if the question meant, do you always 
feel proud, sometimes feel proud, are you always careful, and are 
you sometimes careful of your physical appearance? Similar results 
have been found by others. 
_ Instead of dealing with other more detailed and significant find- 
ings, let me indulge in an anecdote from personal experience. The 
psychological interview on the receiving line in a Naval Recruit 
Training Center during World War II partook of the quality of a 
verbally administered personality questionnaire, since sheer press 
of time did not allow using that distinctive characteristic of the 
interview, the follow-through probing of replies. Enuresis was a 
rather common and disturbing symptom and, consequently, pre- 
clous time was taken to inquire about it. An affirmative or a nega- 
tive reply to the question “Do you wet the bed at night?” had to be 
checked, since “Yes” might mean, “Yes, because fifteen years ago 
at the age of six I had an accident,” while “No,” might mean “No, 
I haven't for two nights in a row.” Wording “When did you last 
wet the bed?” was found to increase objectivity in that a better 
understanding of the intent of the question followed. In this pedes- 
trian, minute improvement we can see how objectivity improves. 
_ Sometimes personality questionnaires are criticized as if ob- 
jectivity were an absolute, In view of Thurstone’s work with the 
personality questionnaire, it is of interest to note that this is the 
position he took. He asserts flatly that such questionnaires are not 
tests in any strict sense since tests are “. . . objective procedures 
(821, P. 353) with the implication that questionnaires are not. One 
may sharply separate tests from questionnaires, as Cattell does, 
without denying questionnaires some objective status. To Thur- 
Stone’s position, one may take exception, as I have tried to do in 
arguing that objectivity is a relative matter. 

Most personality questionnaires, in my opinion, have proven to 

© unsuccessful in their tasks as scientific instruments. The indict- 
ments by Ellis (12) and Ellis and Conrad (13) in large measure 
Seem justified. One may, however, argue on certain points, such 
as, classifying all questionnaires together, when a breakdown by 
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the particular instrument may show more encouraging sanlis Tas 
Minnesota Multiphasic REN eae ir ay e fa 

i ly better in its reviews than other instr . : 
erg Arten particularly the MMPI, probably ei rg 
continuing use and expanded value, despite the general fai 
part, to the development of specific means of increasing objectivi 5 
The Lie Scale is one such device. In addition to indices of incr rae 
ing objectivity with the MMPI, there is the intimately related hae 
that it is a more complete, complex, and intricate instrument 
many of the other personality measures, 

Paychalogy is l broad sijen and other influences, patep 
running counter to the prevailing Zeitgeist (perhaps, reprezen nne 
still another trend), which appeared in some measure in the nr 
twenties reached a considerably higher peak of visibility in t 
mid-thirties. I refer, of course, to projective testing. 


Projective Techniques 


monograph (2). He, 
David Levy. TI 
projection, despite its preemine d 
Inkblots, themselves, are nothing new or startling. Indee® 
Leonardo da Vinci (8) pr 
ing a sponge full of 
tions one might see 
to. So far as psyc] 
cerned, inkblots we 


nd Henri (36). Dearborn (9, 10), two years laten 
published material f inkblots with a small group ° 
Harvard professors and students. 


. . = 
a projection as a method of testing arose 193) 
th Me). and independently. In 1935 Murray (2 
published with Morgan his first paper on the Thematic Appereeption 
Test. Sears (28) published the first of his papers on experimen mM 
studies of Projection in 1936. In Britain, in the same year, Ca Int 
(7) published his Guide to Mental Testing, including a descrip? 
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of a projection test. Since it is probably not too well known to 
American audiences, it will be described briefly. He called it a 
‘projection test,” which, I believe, is the first time the term was 
applied directly to a test. It consists of 74 items in which each item 
has three alternatives as in the following instance (88, p. 71): 
John strained every nerve to beat the others because: 

he was determined to be top 

his father wished him to succeed 

he needed the scholarship. 
The most appropriate of the three endings was to be checked on the 
assumption that one person will project his own chief impulses onto 
the ending chosen. If self-assertive, he would be likely to choose 
the first; if submissive, he would prefer the second. The items were 
developed so as to give scores on Self-Assertive versus Submissive, 
Cautious versus Bold, Acquisitive, Gregarious, Curious and De- 
pendent tendencies, adapted from McDougall’s list of instincts. 
Standardization was not carried out and not too much use has 
been made of the test. In today’s perspective, it could be called a 
multiple-choice, sentence-completion test. It is pertinent to my 
theme to indicate that, if standardized, this could have been an 
objective test in the sense that scoring was objective. The more 
detailed and explicit formulation of the projective hypothesis of 
L. K. Frank (16), appeared in 1939. This was the first major source 
id Knowledge of projection which became well known to psychol- 
ists, 

Thurstone considered the projective procedures “tha: nentest 
approach to personality tests” in revealing personal idiosyncracies. 
He asked only that it be unstructured for the subject but well 
Structured for the psychologists since with this structure it could 
be objectively scored. However, he went on to add, if the interpre- 
tation were ‘as unstructured as the test, it would be useless for 
Scientific inquiry. Structure, from the point of view of the examinen 
may be equated with objectivity. Rorschach inkblots seem high y 
Subjective, but experts can prepare independent interpretations 


which agree on essential particulars. This form of objectivity is 


One of the grounds on which the Rorschach is defended. i 
T e influence of projective techniques upon objective persone nY 
ag, involves diametrically opposed influences. ee > 
t is kind of measurement increased subjective, impressionisti a 
tive trends in psychology. The heady wine of its multi- imen 
Sional character; its relation to dynamic theory, particularly psy- 


i is; i i ‚chiatri agues; 
analysis; its enthusiastic reception by psychiatric colleagues; 
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and its usefulness in clinical settings all contributed to this a 
objective trend. Yet, without its challenge i a ae 
ogists would probably not have seen the possibilities of > o nE 
the scope of objectivity to include improving the objectivi y ae 
psychologist himself. In considerable measure, whether boc = 
use of projective techniques sympathetically or as irritants, it : 

forced us to re-evaluate and broaden our meaning of objectivity. 


The many validity studies of the Rorschach that have produced 


P R f > years 
negative results have given for the third time in the last fifty year 


an excuse to be skeptical of testing, this time of projective pe 
sonality testing. Since this is a current skepticism, no one can a 
“what happened next.” But something will happen, and I is 
suggest that it will be further objectification of projective = ae 
niques, but without disregard of the complexity and subtlety t a“ 
devices permit. In one of the later chapters of this boo K 
Holtzman focuses attention on the problems of objective scoring i 
projective techniques, while preserving their underlying purpose- 


Wuar Is an Oxjective Test? 

Now that the historical survey has been completed, I would like 
to consider present-day thinking about objective personality testing: 

In order to be able to present a cross-section of present-day co” 
ceptions of objective Personality testing, the authors of this sym- 
posium indicated what they considered to be the meaning © 
“Objective Approaches to Personality Assessment,” with specia 
emphasis upon the qualifying term, “Objective.” 

Bass considered objectivity to be complete independence from 
examiner effects, or as he also put it, zero variance due to the exam 
iner. Berg 1 ferred to scorable, f 
which scoring would be identical if 
Edwards emphasized 


Quitty considered jt 


users of the approach, Pepinsky, in relating objectivity to an ap” 
proach to Personality testing, disclaimed 
lieved ‚what is meant is two-fold: (1) minimization of errors = 
observing and recording, and (2) minimization of variability in t? 
task conditions on Separate occasions (not, he adds, in minimizing 
stimulus ambiguity or uncertainty for the subject). £ 

In varying degrees and either implicitly or explicitly, many ° 
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these statements stressed not the test alone, but objectivity as a 
matter of interaction of test material and examiner. Objectivity is 
localized not only in the material, but also in the examiner. Ob- 
Jectivity is not the same as numerical scores or impersonal records, 
although it is the one way objectivity is expressed. Blood-pressure 
records are numerical and x-ray photographs are objective, but 
both are open to subjectivity of interpretation. 

It must be remembered that these were succinct replies to a 
question. It does not follow that the respondents would not agree 
in some instances with expansion of the meanings they specify, or 
even agreement with other, more extended ways of putting the 
matter. I will also add that they reserve the right to disagree, 
Violently or otherwise, with later remarks, either of my own, or of 
the other participants, I shall now proceed to summarize the some- 
what more lengthy statements. 

Cattell gives a more specific meaning than do the others, drawing 
upon his glossary to Personality and Motivation Structure and 
Measurements, wherein he defines an objective test as follows: “A 
test in which the subject’s behavior is measured, for inferring per- 
sonality, without his being aware in what ways his behavior is likely 
to affect the interpretation.” To leave no doubt about their differ- 
entiation from questionnaires and the like, he further expands in 
the text as follows. “It is a portable, exactly reproducible, stimulus 
Situation, with an exactly prescribed mode of scoring the response, 
Of which the subject is not informed. All objective tests are also 
experimental measurements, but not all experimental measurements 
are tests. The difference of T data from Q data resides in the last 
clause, for “the response cannot be deliberately self-evaluative and 
Se f-revelatory if the subject is not told how his response is going 
to be evaluated.” Cattell would, thereby, rule out personality ques- 
tionnaires as tests, but not, of course, as personality measures. 
fh Sopar goes beyond this discussion, to speak of a test as objective 

‘ny one or more of three ways: “(1) its stimulus, (2) the response 
Which it permits, and (3) the scoring method used.” He continues: 


a quality of objectivity is one of clarity of structure; in this sense the objec- 
and oe continuum is equivalent to the clarity-ambiguity dimension 
Plete} structured-unstructured dimension. This means that a in Br 
a aie objective test is one in which the stimulus has the same signi en a 
in m jects, the responses which he may make are limited in number an = 
ei aning, and the scoring leaves no room for judgment by the scorer. By this 
«a ution, the tests we actually use are scattered along a continuum, and any 


u eae 
1 Agment as to whether a parlicular test is objective or otherwise is somewhat 
arbitrary, 
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As I see it, Super is saying that objectivity is either clarity = 
structure, suggesting that these terms are more: clear in the pr ge 
context than is objectivity, though not denying meaning mS ‘ 
jectivity. His personal preference, he goes on to state, is the strié 
tured-unstructured continuum, so far as classification of tests are 
concerned. 


Hunt, as might be expected from his present interests in objectifi- 


cation of clinical impression, does not limit himself to test settings. 
He writes: 


I interpret “objective” as pertaining to “public” rather than “private en 
tion. Thus the data of introspection are private until turned into some om = 
report when they thus become public, since the forms of report, nenge p 
behavioral, can be handled as public, verifiable (by others) phenomena. or 
jective” has many parameters, loosely the clarity and specificity of definition 


the phenomena, its duplicability, its control for experimental observation, its 
statistical amenability, ete. 


This is a still broader definition, and very close in spirit to Feigl's 
account of intersubjective testability, 


Hathaway gives the most detailed analysis, but I am taking the 
liberty of quoting him verbatim. 


I believe I am correct in placin 


g our local emphasis in definition of the word 
“objective” upon the qualities 


of reproducibility and most of all upon the a 
sence of an intervening interpretation between behavior of the subject and t 2 
material available to a third person. Data are objective when they are Kan» 
mitted directly from the subject to others who may then interpret them. The 
verbatim responses to Rorschach cards are objective items, but they veoma 
something else (loosely, improperly called subjective) when they are cla 4 
or in any other way characterized by the examiner. The MMPI items checke 

by a subject constitute objective information, and these remain objective when 
put into scales. Discussion of the meaning of the scales or profiles is no longer 
objective. A TAT story is an objective item when presented verbatim but loses 
objectivity as soon as any interpretation or condensation or expansion occurs 


on the part of the examiner, There are intermediate situations. A series of 
experiments may establish th 


at certain Rorschach responses occur with a 
greater frequency than others. Preserving objectivity, one could then classify 
a given response or set of resp ja 
The frequenc: 
however, th 
elements of The examiner could exercise freedom 
and call a response either a frequent one or an infrequent one. When such 
examiner freedom enters the situation, the resulting score loses objectivity: 
Similarly, an MMPI administered under 


; special deviant conditions that may 
not be men ation of the objective d 


—— 
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I have not developed the idea of reproducibility, but it is inherent in what 
I have been saying. One does not require that the objective score be reproduci- 
ble in exactly the usual sense of reliability but rather, that there be possible a 
hypothetical construct representing a reproducible element in the subject. This 
construct is not possible if the product of the situation represents some inter- 
action of the examiner with the subject or some aspect of the examiner's psyche. 
. Most projective devices (and I tend to use “test” as almost completely imply- 
ing objectivity) have traditionally been much less objective than devices that 
permit the patient to make responses that can be treated by clerical means. It 


is unfortunate that objectivity has been tied to “paper and pencil” and to the 
idea of formulated items such as in the MMPI. I believe that we would benefit 
from the attempt to extend objective measurement to include not only the ob- 
jective aspects of projective devices, which has been partly developed, but also 
objective ways of treating interview material and free behavior as this may be 
observed by others. 


Proyecrive AND OBJECTIVE TESTS ARE NOT DICHOTOMOUS 


Interest in searching for objectivity is by no means confined to 
the authors of this book. A considerable variety of opinion has been 
expressed elsewhere. One that I consider especially pernicious, 
when the distinction is made without qualification or explanation, 
is that between objective and projective tests, treating them as if 
they were mutually exclusive. One rather widespread systematic 
error has been to contrast projective tests with all other tests to 
which, unfortunately, we have sometimes applied the undeserved 
label, of objective tests. For example, in some of our Annual Review 
of Psychology, two of the major sections on diagnostic testing have 
heen labelled projective and objective. Many “objective” tests are 
not objective in any of the senses we find the world to have been 
used, and projective test materials may be treated objectively. If 
we must have only projective tests and something else, which I do 
not believe is the case, the lame category of nonprojective is a shade 
better, because, at least, it does make an invidious comparison. 

There has been some involvement, spurious in my opinion, in the 
question of objective versus projective, in relation to the nomothetic 
and idiographic approaches. Beck (2) has asserted that objective 
tests are limited to the “subpersonality” in the course of discussing 
the question, or pseudo-question if you will, of the idiographic and 
nomothetic approaches. It may be, that projective tests have more 
adherents from those with an“ idiographic approach and objective 
tests have more from the nomothetic camp, but it does not follow 
that objective tests cannot be used as measures of personality. To 
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i i tainly fail in this 
be sure, single objective tests or test items will certainly, = wor 
task. Factor and pattern analvses, as in the Cattell anc i ae 
approaches, are surely approaches to the total personality, a i 1 uga 
with the inevitable loss of individuality or uniqueness that acc 


panies any theoretical formulation of personality, including tha 
projective formulations. 


WHAT is TESTED? 


White’s discussion (38) of what is tested by psychological u 
pertinent. He points out the psychological tests can no longer er 
regarded as inducing specimens or samples of paommanes. 
restricted functions. The samples may be conceived of as inducir 5 
say, problem-solving capacity, but many other characteristics = 
personality also contribute. He argues that we can never i 
situation on which one variable alone is tested. For example, th 


$ : ; ciet 
problem-solving measure may also tap frustration tolerance, anx y 
control, and level of aspiration. 


7 , f- 
Tests consequently supply ove 
lapping information. In line with 


ri É we 
this, White goes on to prope ie 
must use test batteries since we can no longer pin our faith 


> N J- 
single tests. By use of a test battery, there is an increase in know 


+ no y 3 a engres” 
edge, not merely in an additive fashion, but in geometric prog} 
sion. In the same vein he 


proposes multiple examiners. He speak‘ “4 
in this connection, of psychological tests not yet being so object! le 
as to dispense with this safeguard. In a sense, then, the who i 
discussion is a plea for objectivity, but objectivity at a high enoug 

level of complexity so as not to do violence to the complexity ° 

personality, 


mo, : aan OF 
This point of view can be seen in contrast with the position ° 


factoring psychologists who are interested in purifying their meas- 
ures so they are free of 


N ne 

à i what could be called contamination. But F i 
mans contamination is another man’s extra premium of subtlety 
A not inconsiderable group of psychologists accept this positio ; 


1 
| 
1 
| 
| 
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sonality measures into objective or overt, subjective or covert and 
projective or implicit levels of reference. He, however, explicitly 
indicates that they are not too exclusively associated with one or 
another of the diagnostic methods. In fact, the same instrument 
may supply information at all three levels. They are not defined so 
as to give objectivity to only one level. Campbell (5) has developed 
a classification of tests based on three dichotomies. In the first 
dichotomy he contrasts objective tests for which the subjects under- 
stand there are correct responses, and voluntary tests in which, in 
one fashion or another, the subjects are informed that there are no 
right or wrong answers. The other dichotomies are direct versus 
indirect, having to do with the subjects understanding of the pur- 
pose of the test, and free-response versus structured, having to do 
with the usual distinction made between them, but from the point 
of view of the subject. He would classify tests in terms of these 
three dimensions simultaneously, as in the voluntary, indirect, free 
response type, which would include the Rorschach and the TAT, 
and the voluntary, direct, structured type which would include the 
MMPI, and so on. His use of objective runs counter to several of 
the meanings of objective we have considered. In large measure, 
this arises from his use of objective in a phenomenological orienta- 
tion—the phenomenologically objective environment. Accuracy and 
error, as he says, are in the subject’s mind. Rosenzweig, in contrast, 
refers to the psychologist’s orientation, not the subject’s. Perhaps 
ing classification uniting both, the subject's and the examiner’s 
age of reference will give us an even more adequate classification 
an do Campbell's and Rosenzweig’s when considered separately. 
Ia eninde in this connection of George Kelly’s witty remark 
thin hen the subject is asked to guess what the examiner is 
a a we call it an objective test; when the examiner tries to 
(20) What the subject is thinking, we call it a projective device 


Me limitation makes impossible an exhaustive survey of the 
jee = er of modern literature relevant to the question of ob- 
might + in personality testing. Certain other selected references 
Re e mentioned. Frank (17) contrasts the psychometric and 
=, (26) approach. In the context of norms for the TAT, Rosenz- 
of A a lt a discussion of how they help in the process 
aid. aie cation. Levinson (21) compares and contrasts projective 
Nonpr. lity tests. Rapaport (24) discusses the principles underlying 

Projective tests of personality. Allport (1) considers the ad- 
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t PE 
Ton IN PSYCHOLOGY,” write 
Jensen in the 1958 edition of the 


p. 295) “are seldom disproved: tl 
is rather humblin 


. i e 
respective authors suggest, for rigidity is still not unfashionab 
and the authoritarian 


been driving us for mo 


and expressive movement, 

our journals, in order to make clear the validity 

of the notion that the theories of personality that underlie our too 

and techniques, and the instruments themselves, tend to have som® 
24 
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of the enthusiastic and transient support of fads. In fact, Jensen 
(17, p. 306) was moved to write a formula for a psychological fad, 
which consists of “making available an easy-to-use measuring device 
with a significant label and fascinating content.” The reasons for 
this seem clear: a readily used and fascinating tool prompts people 
to make use of it. When it has some bearing on a current theory, 
this use becomes not only interesting but academically respectable 
and even, perhaps, prestigeful. 

Perhaps one reason why theories take on something of the nature 
of the fad is that the instruments used in studying them at first 
seem relatively simple, suggesting that one can readily test the 
underlying theory. In due course, research proves that neither the 
instrument nor the theory is as simple and straightforward as it 
seemed, and the fringe researchers drop them, moving on to a 
newer theory and a newer test. And so, as Jensen says, the theory 
does not die, it fades away. That the fading process is only relative, 
but not complete, is shown by the emphasis put on introversion- 
extraversion in Eysenck’s current work (9). An historical example 
is the revival of interest in the expressive movement, resulting from 
Allport and Vernon’s work twenty-five years ago (2), about ten years 
after Downey’s work with it (7) had actually faded. 

I feel humble in dealing with theories for another reason, for I 
work, not as a theorist, but as an empiricist. As an empiricist, how- 
ever, I feel the need to organize facts, to see what they add up to, 
and to be guided in planning my further work by the perspective 
thus gained. This, I have found to my surprise, makes me, in the 
true sense of the term, a theorist. For theory, so I am informed 
by theoretical theorists, is nothing more than the attempt to explain 
the relationships between sets of facts, I do that, and so do all of 
us, more or less consciously and in more or less sophisticated ways. 
The result is a rationale, a set of assumptions, underlying every in- 
Strument we use, every technique we employ. These may not be 
integrated into theoretical systems, but they constitute theories. 
Perhaps here is the difference between the theorist as we generally 
Conceive of him and the empiricist: the former constructs a system, 
the latter does not build nor adopt a system but gets along with a 
less well organized body of related facts. 

In preparing this paper, then, I found that I was dealing, not with 
theoretical systems, but with the limited theories, with the more 
Specific assumptions and hypotheses underlying methods of asses- 
Sing personality. Most test authors have not consciously developed 
their tests in terms of a systematic theory of personality; rather, they 
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ï > specific 
have developed their instruments on the basis ee 
hypotheses concerning the ways in which Be ae eee 
itself. Underlying these hypotheses, of course, = an 
theories as to both the structure of personality ge = 
The structure issues have generally been those o x i 
whole versus the atomistic constellation of traits; d eon 
issues have been questions concerning the traits that make up 

ism or the constellation. y iei 
ae concern is with the measurement of Pe, =. 
than with personality theory as such, and since I oak alae 1 
measurement practitioner but certainly not a persona ity dhe Jonit 
shall address myself to the first type of theory, that id af ose 
order theories and hypotheses governing the gen et es oe 
sonality measures. And, since our focus here is on g e Te 
proaches, I shall deal at greater length with that eei ir 
continuum, treating the more subjective or EE bs 
only enough to place all of these approaches in perspective. 


Tu OBSERVATION APPROACH 


p rough 

Performance. One approach to personality assessment is pe E 

observation of performance, of the personality in action. t as ality 

that people are what they do. The medium in which the a he 

characteristics manifest themselves in thus overt, observab a alte 
havior; the method of assessment is observation; the measure res 


f A t each ot 
ing is a characterization or a rating. Let us look briefly at eac 
these categories, 


Media. The medium i 
be either of two types: 
artificial or manufacture 
basic assumption is that 
selves in everyday beha 


n which the performance takes place w 
a life situation, or a miniature, that is ike 
d, situation. In the life situation w bs 
significant personality traits manifest t oo 
vior, and that observations of this beha lity 
may be recorded to obtain a meaningful picture of ir. 
functioning, In the miniature situation test the assumption 1s are 
one can set up situations that bring out important traits ™ 
quickly and more com 


pletely than in everyday life, for closer ob- 
servation by more high} 


behavior, noting it, classifying it 
Measures. These ob 
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In sociometrics, participants in the situation record their choices of 
friends or companions for real or fictitious activities, thus in effect 
summing up their observations; it is assumed that people are aware 
of their differential preferences and are willing to express them 
under appropriate conditions. In nominations, participants name 
the persons whom they believe fit each of a series of labels, roles, 
or personality descriptions; here, too, they in effect summarize 
observations of performance. The assumption is that people notice 
individual differences, particularly differences in roles played in 
social groups, and are willing to share these observations. In ratings, 
either participants or observers rate participants in the situation for 
characteristics that are believed to be important and that are con- 
sidered likely to manifest themselves in such a situation. The as- 
sumption is that the rater will be able to identify the trait or 
behavior in question, and be able to make a judgment concerning 
the frequency or degree to which the subject manifests it. In be- 
havior frequency counts, the observer records the incidence of types 
of behavior or actions which are expected to have significance, the 
assumption being that the frequency of types of behavior in that 
situation is indicative of behavior tendencies in other situations 
and reflects underlying personality dimensions. 

Evaluation. It would be worthwhile, but this is neither the time 
nor the place, to evaluate in detail the construct and predictive 
validity of each of the types of measures and media used in assessing 
personality through performance. At the risk of overgeneralizing, 
and without citing evidence which has been well summarized by 
Heyns and Lippitt (14), by Bass (4), Lindsay and Borgatta (24), 
Flanagan and others (11), and by Hollander (15), I shall nevertheless 
express in a few brief statements my understanding of the status 
of each of the measures and media used, of the method (observa- 
tion), and of performance tests of personality in general. 

First, the measures. Behavior frequency counts have generally 
Proved useful when appropriate dimensions of behavior have been 
identified for personality assessment as pointed out by Bass (4), the 
Ohio State group (31) and others working on the initiation of struc- 
ture. Ratings have proved useful as global measures, but not as 
measures of specific traits, for global measures seem to reflect either 
the success of or liking for the subject. Nominating and sociometric 
techniques (15, 24) have stood the test of experimentation rather 
well, the former for the assessment of social roles and the latter, like 
ratings, for the assessment of social acceptance. 

Secondly, the method. Observation puts a premium on the person 
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doing the observing if little structure is provided, on rationale and 
training if much structuring is done (14). In this method, the person 
is the instrument more than the devices which he uses to record 
his observations when structure is minimal, less than his devices 
when his procedural directions are so refined as to make a machine 
of him. The question has typically been that of the clinician as a 


tool (28), a topic of considerable current interest with which others 
will deal later in this book. Heyns and Lippitt point out in the 
Handbook of Social Psychology (14, p. 403), making the clinician 


rather than his procedures central is a mistake in the use of observa- 
tional methods, 


Finally, in this brief evalu 


ation of observation of performance in 
the assessment of personalit 


y, we come to the media, the life situa- 
tion and the miniature situation. The former has the advantage of 
realism, for here the test is life itself; but it has the disadvantages 
arising from the psychologist’s inability to control the situation. 
The sampling of behavior may be poor, and conditions may make 
the recording of observations difficult, as exemplified in studies 
made of personality factors in survival. In those studies observations 
of any one air crew in the Strategic Air Command living under 
survival conditions could | der either winter or summer 


be made un 

conditions, but not under both. Observers could watch all of the 
crew some of the time and some of the crew all of the time, but 
not all of the crew all of the time. The miniature situation test, 
on the other hand, tends to lose verisimilitude both in content and 
in the motivation of participants for some purposes, although for 
others verisimilitude can be achieved. The leaderless group dis- 
cussion can, for example, capture both the content and the spirit 
of many executive situations 


ee PERSONALITY PROJECTION 
Projection 


xaminee 
ia of proj 
nterpri 


a of projection. They are 


type has its devotees, so much so, in fact, 
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that we might follow George Kelly (20, p. 335) in describing them as 
the fluid-blot people, the human-picture people, and the disjunctive- 
sentence people. The media differ in two ways. They differ in the 
amount of structure provided, as in the amorphousness of an inkblot 
and the focus provided by the sentence-stem, as well as in the use 
of stimuli which may evoke either impersonal or personal content. 
Inkblots, for example, are more likely to elicit impersonal responses 
than are the cartoons of the Rosenzweig PF test. 

Method. The method of test scoring and interpretation used in 
this kind of assessment may be contrasted with the recorded ob- 
servation method used in situation tests. The observer, we have 
scen, is a cross between a clinical instrument and a machine because 
he collects, sorts, and records data; in projective testing, on the 
other hand, it is the medium, the test, that collects, sorts, and pro- 
vides a record of responses. Since important further sorting of data 
takes place later in both types of testing, it may be important to 
illustrate what I mean here. In situation testing, the observer makes 
decisions as to what behavior to record, sorting out responses such 
as sneezing as “not-to-be-recorded,” and others, such as asking a 
neighbor a question, as “to-be-recorded.” In projective testing, the 
test, not the examiner, does the first sorting for the directions ask 
the examinee what the inkblot might be and the examiner merely 
records the responses to that stimulus; this is even clearer in the 
case of incomplete sentences tests, in which the examinee reads 
each question and writes his response himself. Here the examiner 
makes no decisions as to what behavior to evoke or to record. The 
examiner plays his important part in analyzing the recorded data, 
both in the inquiry of a Rorschach examination and in the scoring 
of Rorschach, TAT, and Incomplete Sentence Test protocols. This 
may or may not be done by the observer in a situation test, but 
the scoring processes are basically similar in both situational and 
projective tests once the data have been recorded. They may be 
quite objective, as in the identification and counting of forms or 
of structure-initiating responses, or they may be very subjective, as 
in the making of a global rating of leadership promise. Scoring may 
use a Gestalt approach and focus on stimuli to which responses 
are made, as in scoring form and color responses in the Rorschach, 
or it may use a psychoanalytic approach and deal with the content 
of responses, as in determining sex identification in the TAT. 

Measures. This leads to the question of the measures derived 
from projective techniques of assessment, and to theories that have 
been much debated in the literature, One theory is the organismic, 
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which is used to justify a global interpretation of the projective 
protocol. Several other types of theory, ranging es ee 
ism to psychoanalysis, are used to support the objective scoring .- 
the protocol through the identification and counting of well-define« 
responses. These scores are in turn interpreted in terms of their 
hypothesized, or occasionally their demonstrated, significance. It 
would be possible to dwell on these theories at some length, but 
that is not the purpose of this paper. . 

Evaluation. Perhaps the best way to evaluate briefly is to see 
what the evaluators say rather than by looking at each medium, 
method, and measure. We can do so by glancing at the Annual 
Reviews for three recent years. In 1954 Lowell Kelly wrote 
(18, p. 238): “The curious state of affairs wherein the most widely 
(and confidently) used techniques are those for which there is little 
or no evidence of predictive validity is indeed a phenomenon ap- 
propriate for study by social psychologists.” 

In 1956 Cronbach wrote (6, p. 173) that “Assessment in the OSS 
style has now been proved a failure,” and he cites, among other stud- 
ies, the Holtzman-Sells analysis of the predictive value of clinical 
analyses of projective protocols (16). He concluded that, “Assessment 
encounters trouble because it involves hazardous inferences,” in 
which assessors go considerably beyond known relationships be- 
tween predictor and criterion variables. He quotes Symonds and 


other students of projective methods, to the effect that there is little 
theoretical basis for 


expecting fantasy, as revealed by projective 

techniques, to be directly related to ‘overt manifestations of per- 
sonality such as academic Success or work proficiency, 

In the 1958 Annual Review (20) George Kelly is cautious, but the 


impression resulting from the specific studies that he cites but re- 
frains from synthesizin 


g is not good. Cronbach’s earlier conclusions 
seem to hold, and even if it does in due course develop that pro- 


‚ we must ask of what 


ques in assessment, since they 
have not been shown to have predictive validity. After a review 
of the research with the Rorschach (29) which led to the conclusion 
that it has no validity for differential diagnosis, for understanding 
conflicts or fantasies j 


for psychotherapy onali sation. tle 
a . > personality description, 
prediction of behavior, or evaluating or predicting the outcome of 


psychotherapy, Eysenck reported in 1955 (10, p. 233) that the Rors- 
chach was abandoned as a clinical tool at the Maudsley Hospital. 
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Although we have considered dropping the skill course in pro- 
jectives in my own institution, we still require that all clinical, 
counseling, most people in personnel, and all school psychologists 
devote a large block of time to acquiring some competence with 
these projective techniques, the utility of which is unknown. We 
have agreed that they have no validity, but we retain the require- 
ment. We do this for three reasons: 1) the unsatisfactory but 
practical consideration that such psychologists are expected to have 
these skills and are likely to both feel and be handicapped if they 
do not, 2) the fact that they can learn something useful about clinical 
interaction by studying these procedures, and, 8) the hope that 
familiarity with these methods may yet provide psychologists with 
a basis for some major break-through in the field of personality 
assessment. 


SELF-DESCRIPTION 


We come now to our third and last type of personality assessment 
procedure, self-description. George Kelly (20, p. 332) describes this 
type of method as the one in which “the subject is asked to guess 
what the examiner is thinking,” as contrasted with projective tech- 
niques in which “the examiner tries to guess what the subject is 
thinking.” This is the oldest of our three types of approaches, and 
that which has most readily lent itself to experimentation and re- 
search. It best qualifies as an objective approach to personality 
assessment, being near the upper end of the structured-unstructured 
continuum. By some definitions of that term I might legitimately 
have confined my paper to this type of device. While it seemed 
wiser to try to gain perspective by reviewing all three types, I have 
saved this type for the last with the aim of dealing with it in some- 
what more detail, under two headings: trait lists and biographical 
Inventories. 


Media: I. Trait Lists 

The media for self-description in assessment work have typically 
been behavior or trait lists in the form of the personality inventory, 
the check-list, and the rating scale. These, according to such diverse 
students of personality as Gordon Allport (1) and Frederick Wyatt 
(35), are appropriately used because of the importance of conscious 
motivation in normally well-integrated people. They are direct 
methods, in that they ask the individual to describe himself as he 
sees himself. George Kelly (19) and Leary (22) have recently, with 
quite different approaches, made considerable use of self-descrip- 
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tions. The various underlying theories of personality are too numer- 
ous and too diverse for discussion here, but it is relevant and 
possible to comment on the general theoretical acceptability of these 
methods. 

Because self-description was found, in Woodworth’s work in 
World War I, to work reasonably well, this medium was widely 
exploited during the pragmatic, empirical, behavioristic period that 
followed. With greater sophistication in assessment and personality 
theory, self-descriptive methods came into disfavor among psychol- 
ogists. The devices developed by Woodworth, Allport, Laird, Bern- 
reuter, Bell, and others were still used, for lack of better methods 
of personality appraisal, but generally with recognition of their 
weaknesses and with some apology for not having something better. 
Thus in 1944 Maller wrote, in Hunt’s symposium on personality 
(27, p. 180), “It is the psychologist’s dilemma to choose between the 
standardized questionnaire which is broad in scope but of doubtful 
validity and the performance record which is obviously valid but 
of narrow scope.” 

The dissatisfaction with the self-descriptive inventories which 
resulted from the unbridled empiricism of the 1920's led not only 
to work with other methods but also to better empirical work with, 
and better theorizing, about self-description. Thus Guilford em- 
barked upon the twenty-year long program of refinement of per- 
sonality inventories through factor analysis, which has taken current 
form in the Guilford-Zimmerman Temperament Survey. Hathaway 
built on Rosanoff’s theory of personality as well as on the Minnesota 
empiricism in selecting items for, and for ascertaining the concur- 
rent validity of, the Minnesota Multiphasic Personality Inventory. 
Edwards developed the Personal Preference Schedule by combin- 
ing Murray's need theory with psychometric improvements such 
as the equating of items for social desirability, and Bills developed 
his Index of Adjustment and Values with the help of self theory both 


in devising his scoring systum and in designing validation experi- 
ments, i 


c Method, Early work with personality inventories relied on 4 
ombination of content and construct validities. Items were written 


or selected because they described symptoms which were believed 
to characterize various 


types of adjustment; this was content validity 
as defined 4 the APA Committee on Test Standards (3). Items 
were retained or rejected on the basis of internal consistency, of 
agreement with the total score for the scale in question: this was 


construct validity as now understood. It was generally assumed that 
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if the item described a symptom that had been observed to char- 
acterize a type of maladjustment, such as neuroticism, and if an- 
swers to it tended to agree with total score for that trait, then 
validity was demonstrated. It is interesting to note that these two 
types of procedure fell into some disrepute until the APA Com- 
mittee gave them a name, and that their new respectability was not 
much dimmed by the greater stress put by that Committee on 
predictive validity! ` 

Experience and experiment showed, however, that construct 
validity is merely suggested, not proved, by item-score correlations 
and item inspection and that it is not equivalent to empirical valid- 
ity. Scales purporting to measure neuroticism, for example, were 
found by Landis and Katz (21) to contain some items which were 
answered by neurotics in the normal way, and by normals in the 
neurotic way, more often than in the expected manner. And the 
scales were found not to have appreciable concurrent or predictive 
validity, in that they failed to differentiate effectively among groups 
of people who were known at the time of testing or later to differ 
in significant and presumably relevant respects, such as neuroti- 
cism, type of psychosis, social role on a college campus, or occupa- 
tion. This led to the decline of interest in the content and construct 
method of developing self-descriptive instruments which character- 
ized the 1940's, and to less emphasis on the assumption that people 
can and will accurately describe their own traits. 

One of the further outcomes has already been mentioned: it was 
a greater interest in empirical validation, in both concurrent validity 
and predictive validity as now understood. The best example of 
this approach is the MMPI, in which McKinley and Hathaway set 
out to devise a self-descriptive inventory that would differentiate 
between various types of maladjusted and disturbed patients, and 
between these persons and normals. But, it was recognized, in the 
first writing or selection of items that it was important to have some 
kind of guide, that is a theory which would generate hypotheses 
concerning differences. 

A second outcome, also mentioned, was thus an emphasis on a 
higher order of theory. Theory of a very low level had been tapped 
in earlier work: lists of symptoms characterizing each group were 
examined for suggestions as to items. A higher order of theory was 
now brought into play, however, by the demonstrated weakness of 
the symptom method: it was the use of a theory of personality organ- 
ization. In the case of the MMPI it was Rosanoff’s theory that pro- 
vided a framework for the inclusion or exclusion of existing items, 
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and for suggestions as to additional types and traits and of Tamie 
which might be included. Item-score correlations are obtaine i = 
items are factor analyzed, by some investigators in order to estab ish 
internal consistency or to validate theories concerning personality 
structure. But the ability of total scores to differentiate between 
criterion groups is also checked, and in addition item-criterion 
correlations are often obtained in order further to purify the scales. 
A third outcome was a recognition of the fact that self-description 
cannot and need not always be taken at face value. Empirical 
validation can be used to strengthen construct validity, as in the 
MMPI, but it can also be used to avoid construct validity as an 
issue, as in Strong’s Vocational Interest Blank. The success of purely 
empirical self-descriptive instruments, unlike that of those having 
a theoretical basis for item selection, left unanswered questions 
concerning the reason for the success of the empirical approach. 
And hence the fourth and, so far, final refinement in the develop- 
ment of self-descriptive methods. Referring specifically to Strong's 
Blank, but making a point more broadly applicable, Bordin sug- 
gested (5) that the reason for the validity of self-descriptions lies 
in the fact that a self-description reflects a self-concept, and that 
self-concepts have a directive effect on behavior, Thus the man 
who describes himself as friendly may not actually be friendly, but 
his behavior does tend to resemble th 
constellations of self- 
himself as friendly, active, and alert may not actually be friendly, 
active, and alert, bu t in the same way as others 
His self concept is similar, and the as- 
be similar. 
i s derived from these various self-descrip- 
tive devices fall into three categories: they are scores for traits, for 
Sroups, or for self-acceptance. Most of the self- 


who see themselves thus, 


onal stability, 


from the second type to a cross between 


e it yields scores derived from nosological 
anslated by many users into traits which 
Se groups, a practice which Gough has car- 
veloping the California Psychological Inventory 
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old self-other-ideal technique (34). In these, self ratings are con- 
trasted with ideal ratings, or favorable self descriptions are 
compared to unfavorable self descriptions, in order to obtain a 
measure of discrepancy between self and ideal or of self acceptance. 
Some work has been done with still another type of measure 
derived from self descriptions, a measure of self consistency or in- 
tegration such as McQuitty (26) sought to derive by analyzing 
the congruence of self-attributed traits. The reasoning is that the 
integrated, self consistent person tends to attribute to himself only 
traits or behaviors that tend to be associated or that are compatible 
with each other, whereas the conflicted or unintegrated person will 
attribute to himself traits that are incompatible with each other. 
Evaluation. In evaluating the measures, methods, and media that 
have been used to obtain self descriptions, it seems safe to say that 
we have finally developed two approaches that lead to valid results. 
One might be called the group difference method, as used by Hatha- 
way with the MMPI. It consists of starting with a well-defined 
group, be it clinical or occupational; of developing a theoretical 
model of that group from whatever data are already available, a 
model that serves as a guide in item selection or item writing; of 
purifying the scales to which items are assigned by internal con- 
sistency or factor analysis methods; and of empirically validating 
the self-descriptions against concurrent or predictive criteria. The 
second method might be called the generalized model method, and 
is illustrated by Cattell’s work with the 16 PF Test. The person who 
uses this method starts with a theory as to the significant dimensions 
of personality, which may be quite empirical in its origins, selects 
or writes items according to this theory, establishes the internal 
consistency of the scales to which these items are assigned, and 
validates them empirically by establishing the existence of hypoth- 
esized differences in selected nosological or other groups. When 
One or more of the steps has been omitted, one may well be suspi- 
cious of the validity of a self-descriptive instrument. Some of the 
Contemporary personality inventories and adjective checklists have 
een developed by these methods, with results which appear much 
more promising than those of the less systematic and less thorough- 
going approaches used prior to World War II. 


Media: II. Biographical Inventories 


The second type of self-descriptive technique, as distinguished 
from the trait list, is the biographical inventory. The basic assump- 
tion here is that one’s past behavior is a good predictor of his future 
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behavior. In his discussion of personality in terms of gape 
learning Guthrie (13, p. 66) says, a person's “past affiliations 5 P t T 
better and more specific predictions of his future than any of the 
traits that we usually think of as personality traits. But it is just this 
predictive value that is required of a personality trait and nothing 
else.” 

Method. The method with which we are most familiar in con- 
nection with the biographical inventory is that of the Aptitude 
Index used by many life insurance companies since the birth of 
applied psychology, and applied with signal success to the selection 
of aircraft pilots by Kelly in the Navy and Shaffer in the Air Force 
during World War II. In the last instance (12) it consisted of using 
available knowledge or hunches as to the backgrounds of success- 
ful and unsuccessful fliers to write multiple-choice biographical 
items, and of validating responses to these against a success-fail 
criterion. Thus men who had relatives who held private pilots 
licenses proved more likely to succeed than did those without this 
kind of prior contact with fiying. This finding led, in turn, to the 
hypothesis that prior favorable contact with flying makes for success, 
and other prior-contact experiences were canvassed in order to 
supply more experience items for the inventory, These items were 
in turn validated and retained only if they predicted success. 


This method suggested modifications, developed simultaneously 
by Siegel and myself, In my case it was in the Career Pattern Study 
(32) and in studying suc 


cess and failure in survival training in the 
Strategic Air Command (33). In Siegel’s case it was in a doctoral 
dissertation (30). Siegel has followed through by publishing his 
inventory; I, by conducting further studies of the method and its 
applications with high school boys, with engineering applicants at 
the General Electric Co., and with telephone operators at the Ameri- 
can Telephone and Telegraph Co. 
Basically, the modification consists of pushing beyond low-order 


hypotheses Concerning the relationship between past and future 
experiences to higher-order 


hypotheses that organize data on a 
greater range of experience b p = 


l x y using constellations of experiences 
as a basis for inferring personality traits, 


predictive validity of these traits, Í 
My approach was simil 


ar in my Strategic Air Command and 
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Career Pattern Study research, but in applying the method to tele- 
phone operators Martha Heyde and I have analyzed operator data 
obtained from application blanks and interviews in relation to 
turnover, in order to develop a hypothetical model of the stable 
and of the unstable telephone operator. From this model, described 
in terms of biographical and trait data, we derived a list of per- 
sonality traits believed to be significant in turnover in that occupa- 
tion. Actually and presumably related biographical and life-ex- 
perience items were then written in multiple-choice form, from 15 
to 30 items surviving the various editorial processes for each of the 
hypothesized personality traits. 

We sought to measure the trait “independence-dependence” by 
answers to questions concerning the age at which the subject first 
started using make-up, choosing her own clothes, dating, taking 
overnight trips without her parents, etc. This inventory is now 
being given to operator applicants in major cities in several regions 
of the country, and turnover data are being collected on those 
hired. Three types of analyses of the results are planned. One is a 
factor analysis of the items to test our hypotheses concerning the 
trait or factor structure of our biographical data and experience 
variables, The second is a study of the relationship between these 
traits or factors and turnover. The third is a cross-validation study 
of the items against turnover criteria, to develop an empirical scor- 
ing key. Validated facts and empirical scoring keys will be com- 
pared, 

Measures. Two types of measures are thus derivable from bio- 
graphical inventories of the type with which some of us have been 
experimenting. One is the conventional empirical scale, of most 
interest to the classifier of students and job applicants and, although 

lographical inventories have not apparently been tried on them, 
of mental patients. The other is the trait or factor scale, of which 
Siegel developed ten for his general high school inventory and of 
which I have developed several for my custom-built inventories. It 
is the trait scales empirically validated against external criteria that 
are of most interest to us here, a potentially meaningful means of 
assessing personality. 

: Evaluation. How valid these biographical data measures are as 
Indices of personality traits and how sound the procedure is for 
inferring personality traits from constellations of experience data 
Is not fully demonstrated. I have devoted this much attention to 
_ 8 new approach to biographical data, because the results so far 
indicate that it may prove a valid and more objective method of 
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assessing personality than the self-report methods that call for trait 
descriptions. 

In closing this chapter, which has of necessity covered much 
ground briefly, I should like to relate the organization which I have 
used here to that used by Leary and Coffey (23) in their work on 
personality. They distinguish among public, private, and symbolic 
levels of personality measurement. The public level is that at which 
the individual appears to others, performance in my framework; the 
private level [conscious level in Leary’s latest version (22)] is that 
at which he appears to himself, self description in my scheme; and 
the symbolic (private level in Leary’s book) is that at which he 
reveals himself in projective materials, projection in my discussion. 
Leary’s position is that each of these levels has its values and uses 
in personality assessment. I have already quoted Allport (1) to the 
same effect. While recognizing that at present we have data to 
justify the practical use of methods and information from only the 
public and private levels, from only the performance and self-de- 
scriptive approaches, it seems to me that we must agree with Leary 
and with Allport that all three levels or approaches should be used 
in the scientific study of personality if we are eventually to attain 


the understanding and the control that we desire. 
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Foundations of Personality 
Measurement Theory in 


Multivariate Experiment 


Raymonp B. Carre. 
University of Illinois 


Ponore ASSESSMENT has in- 
spiration in many distinct fields, including such specialty areas as 
clinical, educational, counseling, and experimental psychology. No 
matter where objective testing arises, it is a pointless procedure 
to make measurements and scales that are unrelated to meaningful 


personality structures, Consequently personality assessment an 
basic theoretical researc i 


i h on personality become one and the same 
enterprise, 


STRATEGY or PERSONALITY RESEARCH 
A brief digression on the strategy of personality research is 
essential here to 
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interested in manipulating and we then observe how one measure— 
the dependent variable—changes with the changes we produce in 
the independent variables. In the multivariate approach, on the 
other hand, we enter the experiment with a great number of vari- 
ables, usually allowing them to vary as they vary in nature, without 
attempting to control them artificially in any way. We then tease 
out the relationships among them by the superior statistical potency 
of the methods which have been developed, principally in the life 
sciences, since the days of classical physics. 


COMPARISON OF METHODS 


There are advantages and disadvantages to both methods, though 
you would sometimes think from the pious expressions of brass 
instrument psychologists that all scientific purity lies with the 
classical univariate method. Actually, the multivariate method can 
claim three great scientific advantages. First, it can deal with pat- 
terns and wholistic concepts. A clinician in my presence once 
remarked to a psychologist that he proposed to do some experiments 
on the relation of the superego to school achievement. Whereupon 
the classical experimentalist, a man as direct as he was eminent, 
snorted, “What is a superego? I have never seen one.” 

The implications of this remark should really be “a plague to both 
their houses.” So long as the older type of experimenter deals only 
with single variables he must remain blind to anything that requires 
demonstration as a complex pattern. But equally the clinician, 
although taking many variables into account, is unable objectively 
and scientifically to convince others, e.g., to show that the super 
ego is not a myth but a visible pattern, unless he commands the 
powers of mathematical analysis to determine and demonstrate a 
loading pattern. The multivariate statistical methods possess this 
power, and, as I shall hope to point out in my summary, it is possible 
to define the superego, various drives, and a number of complex 
temperament patterns to a useful degree of exactitude by factor 
analytic means. Furthermore, when these constructs are measured 
as factors, they can enter into exact experiment as readily as any 
single concrete variable. 

The second advantage of the multivariate method is its sheer 
business efficiency. If you go to the labor of measuring, say, two 
hundred variables, in a hundred pairs of two, on a large population, 
you get by classical experiment evidence on 100 relationships. If, 
on the other hand, you do the same amount of experimental work 
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and use a multivariate method of analysis, you throw pple = 
nature of approximately 2,000 relationships. Keen o ae a 
possess the correlations in a matrix of 200 varia sles. u = ed 
is something more than an enormous—twenty to onc—g4 
“ren the 100 relationships of the univariate an 
design are taken from many different samples, as commonly happer 
when an unfortunate reviewer of an area in the P ont E 
tin is trying to make sense out of a hundred independent Desire 1 ss 
the findings are essentially incomparable, statistically, re kn 
coming from different samples, and they are of questiona Fr <n 
parability experimentally, because they are always attained oy rhe 
idiosyncracies of the various investigators and their locations. a 
hypothesis-testing power, and especially the hypothesis-crea ng 
power of the multivariate experiment is here far greater, benmi. 
we know that all the relations are comparable, having been ma 
on the same group. en 
Additionally the factor analytic method has special revolvi 5 
powers, in terms of discerning meaningful patterns among een 
relations, We get what a philosopher might call “emergents” fro e 
the accumulated, criss-crossing relationships, such as can pora 
come from reasoning about single relationships or from the pan 
game of partialling this influence out from that, one at a time. In 


personality research it sneans that we are enabled to detect the 
major structures operatin 


8 across this whole field, whereas when 
one works with variables two or three at a time, trying to partia 
out this from th 


3 F 5 = f 
at, one is apt to run around in the prison circle 0 
one’s own feeble vision of possibilities, 


The third advantage of the multiv 
to its general use in psychology, 
the field of personality and clir 
that human beings decline to ] 
experiments on matt 
you wished to study 
law coming to stay 


sychological Bulle- 


ariate method does not belong 
but is specific to its applica ion “ 
nical study. It resides in the fa 


objections to the manipulative experimental design 
in the field of Personality. The first is that you ought not to do it, 
and the second is that, if you throw ethics aside and proceed, the 
artificial insult of the experiment may create a situation quite 
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different from the naturally occurring one. When you chop pieces 
off a man’s adrenal glands you do something more than reduce his 
adrenal functioning. The multivariate experimenter, like the clini- 
cian, allows life itself to make the experiments, in naturally func- 
tioning organic wholes, and then extracts the causal connections 
by superior statistical analytical procedures. If you stick to the 
controlled experiment in regard to emotional learning, etc., you are 
compel!'cd like Mowrer, Miller and others to move increasingly 
away from human beings to animals. This leaves you with the 
impossible task of generalizing across from animal behavior to 
something vaguely analogous in human behavior. In fact, meth- 
odologically you have allowed the rat to lead you into a worse 
cul-de-sac than in any maze you ever constructed for him. 

If now you look at the so-called clinical method with this broad 
dichotomy of method in mind you will notice that it has several 
close parallels to the multivariate method. Both deal with major 
emotional events in the lives of human beings, allowing life itself to 
provide the source of manipulation, and both work upon wholistic 
Perceptions of patterns and relations, rather than upon single vari- 
ables. Indeed, I think it can be seen that there is really no such 
thing as a separate clinical method (unless we are talking about a 
therapeutic method), for, when stripped down to its essential, formal 
procedures, the clinical method is the multivariate method. Un- 
fortunately, though it is formally the multivariate method, it lacks 
Scientific rigor, proceeding by intuition and fallible human memory, 
instead of being carried out on exact measurements by an electronic 
computer, using a far superior memory and a fully explicit statistical 
Procedure, In terms of progress in the scientific study of per- 
sonality, the clinician has his heart in the right place, but perhaps 
We may say that he remains a little fuzzy in the head. The salvation 
of the clinical method lies in filling out its cloudy procedures by 
Structural statistics, decidedly more complex, incidentally, than 
those known to univariate methodology. Factor analysis is only one 
such statistical model, though it is the best we have achieved so far. 


MEASUREMENT FOLLOWS STRUCTURE 


But let us now return from this survey of foundations to my first 
assertion that measurement must follow structure. I am aware that 
this reiteration of “no testing without structure” makes me as pop- 
ular among certain kinds of test constructors in educational and 
clinical psychology as a Baptist minister reminding people of the 
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Ten Commandments in an establishment for organized Ein 
would repeat that you may use the most ng eee a 
cedures, refining Guttman, Coombs, and others to on a 
and still be merely engaged in a sort of psychometric c a on 
as far as any psychological understanding o! pe gie 
is concerned. If your scale is not guaranteed to deal wi i 
thing psychologically meaningful and organic, it ee 3 te 
psychological procedures. And, incidentally, it does ar 
sufficiently realized that a Guttman scale, or any other zo... 
method per se, does not guarantee a factor pure scale. A corre ly 
scaled scale may still be of any degree of factorial wer 
When I mention a demonstrable functional unity in what fol oa 
I refer technically not only to a pattern of covarying parts m. . 
can be demonstrated as a unique, replicable, et 
in terms of factor analysis, but also to a pattern which additiona y 
could be shown to function as a whole by univariate, controlled ar 
periment. That is to say, the pattern should show itself me 
by a person who is higher in one element of it consistently 
higher in the other elements, but also by the parts varying ner nei 
from occasion to occasion when an experimental influence wh 
changes this trait is brought to bear, i 
Within multivariate methods, this means that the factor patteri 


out of which the construct or concept arises, must be demonstrated 
not only by the classical R-technique, but also by the longitudin: 
P-technique. It may also 


ytic 
be demonstrated by other factor mae , 
experimental designs, such as the condition-response design, in wW “uli 
one simultaneously factors in a single matrix both the various stin 


that might cause the pattern to change in level, and all the manifes- 
tations by which t] 


ne pattern is recognized. In short, to ensure ie 
a unitary trait is sound in wind and limb, it should be thumped i 
many different parts, Thus, in Scheier’s work on the Legere 
anxiety, which has come out with certain clean cut results which 
I shall discuss in a moment, it was first demonstrated that some ten 
psychological and physiological variables repeatedly emerged as 
salients in a single fa 


s ctor in studies dealing with individual differ- 
ences in anxiety level, i.e., by R-technique. 


After this P-technique demonstration of the boundaries of anxiety 
as an individual diff ait, a longitudinal study was made in 


i erence tr 

which the fluctuations (in these salient variables discovered on the 
R-technique factor) were measured from day to day under the rea 
ous naturally occurring anxiety stimuli of daily life, A longitudina 
factor analysis, by P-technique, then turned out much the same fac- 
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tor pattern as had been obtained by R-technique. There were some 
differences of emphasis, but it was clearly the same anxiety factor, 
marked by the same major variables. 

A third phase of the research consisted in measuring a large num- 
ber of people on this array of variables and then submitting them, 
in an analysis combining the factorial design of analysis of variance 
with factor analysis, to a number of what are commonly considered 
anxiety provoking stimuli, such as important examinations, a dis- 
cussion of imaginary diseases, some probing of their economic con- 
dition, etc. Correlating in the stimulus differences with all the re- 
sponse differences, resulted in the reappearance of the same anxiety 
factor pattern. In this condition-response design, however, it was 
additionally loaded with the stimuli which are effective in produc- 
ing the anxiety response pattern. Scheier’s work on the measure- 
ment of anxiety thus illustrates the full present scope of multivariate 
method usage and shows how a practical measurement of high va- 
lidity and determinateness can result. 

This digression on complex issues of 
brief as to evoke the comment that for those who knew them al- 
ready, it was unnecessary and for those who did not, it was too short 
to carry the full implications. But we must move on with the state- 
ment that if this agreement is fully examined, it provides justification 
for believing factor analytic findings rather than clinical impressions. 
It also prevents our aligning ourselves, on the other hand, with that 
compulsively accurate psychometrics of scales which still narrowly 
persists in the old faculty psychology of supposing that where there 
1s a single name there must be a single function. 


method may have been so 


A Brier Review or Facror ANALYTIC FINDINGS 


Although factor analytic findings over the last fifteen years have 

cen evaluated elsewhere (4), it will be helpful here to sn m 
sketch of the substantive findings which are the necessary basis 
or the measurement theory I have to discuss. These results in 1958 
are largely the outcome of certain aims and canons of research and 
Method worked out in a first attempt to integrate the field, namely, 
ay Description and Measurement of Personality (2) twelve year 
ago. In the first place, our laboratory has always aimed to gather 
c a widely and simultaneously over the three chief possible media 
o Personality observation—L-data or life records of behavior in un 
; Or questionnaire data, and T or objective test data. In the 2 s 
ecord medium, personality is observed in the natural life situation, 
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r i ents, 
by time sampling, rating, or keeping of recor ds on Lange snp kr 
Er achievements, accidents, etc. This is, of course, the 


medium, in the sense of being an external or cultural criterion for 
any testing. In the questionnaire medium th 
giving his impressions of him 
and willingness to disclose, 


ance or response in a te 
es not know what aspects of hi 
d interpreted. 


th existing concepts in the field, peran 
popular, clinical, and ee 
concepts is then possible 
the same words and situations of e 
inicians, guidance ce 
the advantage that it permits Er 
> that is to say, the notion of a stra : 
total realm of behavior. Pen 
cting the factor analytic appro 
pt of a population of variables 
comes very important. 


Canon: 


S OF PROCEDURE 
In adopting this simultan 


A n- 
cous, three-fold observation of er. 
ality, it was our conviction that any important dimension of pens 
1 Some definition of “objective” 
of objectivi 


sectivity in 
Wot) objectivity of scoring, plus, (2) object ha 
the sense of not involving self appraisal. It is in the latter, complete, sens 
“objective” is used here, and we would 
only objective 


at is 
Suggest the term conspective for a test th 
as to its scoring, This implies that it has 


Scoring methods. 
used the t 


ective test and the tr 
between the Tative test and the conspective 
> Whereas the latter deals 


only with th 
to agree with the us 
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ality should break through all three, showing itself at once as a 
factor pattern in behavior rating data, in the questionnaire-response 
patterns, and in objective personality tests. This theory has been 
only partially vindicated, and some tantalizing exceptions persist. 
Throughout this discovery of structure as a basis for measurement 
it has been a canon of research procedure that factors shall be de- 
termined by simple structure principles, and other principles permit- 
ting a unique, objective factor solution quite independent of any 
psychological pre-conceptions which the observer may have about 
personality structure. Psychologists who have been using factor 
analysis by rotating for “psychological meaning” are merely having 
a pleasant game perpetuating their own superstitions or prejudices. 

Some years ago I talked at Tubingen with the German psychol- 
ogist, Kretschmer, who has done such striking clinical experimental 
work in bringing out the full nature of the schizothyme tempera- 
ment pattern. Whenever I showed him an experimental pattern 
in factor analysis that agreed with his clinical impressions, he would 
say “factor analysis is a remarkably important scientific tool.” But 
when I showed a pattern that corresponded to no known clinical 
pattern, his inclination was at once to assume it to be an artifact, 
and immediately to lose interest in it. This I cite only as a rather 
amusing and well developed instance of the attitude that, together 
with some defects of statistical education, has kept the clinical psy- 
chologist from understanding the importance of these factored 
measurement developments for his work. 

Indeed, there is no need especially to pillory clinical psychologists, 
for psychologists in general seem rather prone, relative to physical 
scientists, to dependence on subjective conviction. Yet if we are 
dealing with a science rather than a religion, we should welcome 
objective methods which surprise us by turning up something that 
does not in the least fit what we knew before. Factor analysis has, 
in fact, produced surprises in the clinical field, for those who can 
see them, much as the microscope did in biology. Notably it has 
turned up at least a dozen clear cut patterns in the personality 
field, that contribute as much to the variance of behavior as any 
such familiar concepts as schizothymia, ego strength, dominance, 
etc., which have nevertheless never been visible to the naked eye 
of the clinician or named or discussed. These structures have not 
yet been accepted as the challenge to existing clinical theory and 
formulations that they really are, for they have power to yield pre- 
dictions of criterion behavior impossible from the familiar concepts. 

A third canon of our research has been that the factor patterns 
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shall be replicated, in at least two independent researches, before 
we begin to give them serious theoretical consideration. To put 
this canon into effect requires considerable planning in research, 
to ensure that a sufficiency of identical variables to permit matching 
are carried over from one study to another. For in this day and age 
we can no longer go along with the idea that the identity of a 
factor in one study with that in another study can be established 
merely by the psychologist’s impression of the psychological sim- 
ilarity of the two. There must be accurate carrying over of salients, 
and the use of a quantitative index, such as the salient variable 
similarity index, to ensure that the patterns really are alike. 

A fourth principle has been that we should not be too hasty in 
interpreting the factors, but should be content to designate them 
by an index number in some agreed universal index among psy- 
chologists, such as that which I have proposed as an international 
index in the current issue of the Japanese International Journal of 
Psychology. A factor will commonly become a recognized part of 
the scenery, and a basis for measurement in a good unifactor scale, 
some years before its nature is fully understood. In the case of 
about half a dozen of the discovered factors, namely, ego strength, 
intelligence, anxiety, general neuroticism, schizothymia, and super 
ego strength, I think the pattern is sufficiently identical with any- 
thing that has ever been called by that name by a responsible 
psychologist, to justify using these customary names and interpre- 
tations—such as they are. In about another half dozen factors 4 
pretty definite idea can be formed of the physiological experiential, 
or dynamic influence responsible for the pattern. For example, 
surgency-desurgency level is essentially the level of general inhibi- 


tion, and seems correlated wi k 
> a ith frequ ment, The 
factor we have called Q quency of past punishmer 


: 3 seems to represent the degree of dynamic 
re in the self sentiment, and so on. While these explana- 
tae ae not as perfect and perhaps not as lurid as those that psy- 
nalysts are fed with their Freudian mother’s milk, they have 
g with demonstrable behavior patterns, aN 
ae | ements of individual differences, with known 
validity and reliability, which can be made the basis of experimental 
y- Surely it is high time that theory began 
ese measurable behavior patterns, frequently 
A orca Tereaschos, instead of the vaguely 
| ably unsubstantiated behavior patterns an 
sequences which inici ig ee 
theories, many clinicians take as the basis for elaborate 


. tor, the surgency-desurgency factor, 
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A last canon of design, in these experiments to put personality 
measurement on a functional basis, and functional concepts on a 
measurement basis, is that continuity should be established in the 
patterns over the whole developmental age range. That is to say, 
not only should the functional unity be established at one age level, 
by the above two handed use of R- and P-techniques, but the age 
range should be cut by such studies at three or four year intervals, 
to establish the mode of growth, as one might take slices across 
the stem of a plant. This is a big order, and it has not yet been filled, 
but sections have recently been taken at 12, 8, and 4 years of age 


and are in press. 
LONGITUDINAL ANALYSES 


The hypotheses of measurement here are that some patterns 
might be expected to persist over all age sections more persistently 
than others, For example, an ability like general intelligence, or a 
temperament trait associated firmly with some physiological or 
body-build component, would be expected to show itself, perhaps 
with some modifications, from the earliest testing period. On the 
other hand, an environmental mold pattern, such as the superego 
or a sentiment to a specific object, might be expected to appear 
only at a given age and to show more pronounced developmental 
change in the loading pattern. The work on personality factors was 
initially done, for good reasons, at the young adult level, but the 
researches of Coan, Peterson, Gruen, and others, maintaining the 
combinations of life record data, questionnaire data, and objective 
test data, show that all but three or four of the factors established 
in the adult level can be traced down through childhood and even 
into infancy. For example, in the factor analyses of time samplings 
of behavior, made at the four year old level, we can clearly see the 
cyclothyme-schizothyme factor, the dominance-submissiveness fac- 

g the paranoid factor, the ego 
strength factor, and so on, operating in the nursery school world. 

On the practical side, for the benefit of those who wish to do 
longitudinal research in personality over a sufficient interval, we 
are in process of constructing measures of these factors in the ques- 
tionnaire medium. Thus, 14 of the 16 factors in the adult 16 Per- 
sonality Factor Questionnaire (3) can be demonstrated and set up 
in the range from 12 to 16 years of age, in a test called the High 
School Personality Questionnaire (5). Twelve of the factors can 
still be clearly recognized at the nine year level and are being put 
into the Child Personality Questionnaire. Peterson has made sets 
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of questions, which must, of course, be given orally, which get at 
these personality factors at the nursery school level. There are many 
technical difficulties in getting a good series of personality factor 
questionnaires to operate meaningfully from the four year level 
right up through the adult level, but these difficulties must be over- 
come, because such longitudinal studies are essential both for un- 
derstanding personality and for the success of applied psychology. 

Indeed, there is both a great need and great opportunity at the 
moment for longitudinal studies in personality structure. Such stud- 
ies will, first, establish more definitely the identity of the factors 
found at the earlier level with those at the later level, by repetitive 
measures on the same children at intervals; second, show which 
factors are most subject to environmental influences, and if so, to 
what environmental influences; third, show the general curve of 
change in these personality factors in the same sense as we have 
established the normal trend in the intelligence factor; and last, 
suggest in what way the pattern of behavior typically changes with 
age. 

In regard to these changes in the test weights in the pattern to 
be measured, we may instance that the ego strength factor in four 
year old children loads freedom from temper tantam, freedom 
from enuresis, infrequency of headaches and psychosomatic disor- 
ders, infrequency of manifestations of jealousy, etc. By eleven years 
of age, enuresis has dropped out of the loading pattern, the main 
emotional stability versus instability variables remain, and some 
new elements have come in. Similarly, in the dominance factor, 
disobedience, sulking and “talking back” are prominent in the time 
ii Ping variables at the early age, whereas by the adult level, 
onal, bee ae i obedience las become uneon 
adat & ee ativeness has disappeared and the domina 

> ything, rather more silent than the average. 


VALIDITY oF PERSONALITY MEASUREMENTS 


validity, on the one hand, and 
other. TI is i 
abies oe Se 3 by the correlation between a given 
an i In questionnaire or rating scale and the factor as derived 
TS n criterio Thus we might validate a test 
against such factor const i ig 
a Tucts as anxiety, or against ego strength, or 
g gency, or against schizothymia. The external or cultural 
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validity is never validity singular but validity plural. That is to say, 
there are thousands of things against which a factor’s predictive 
power could be tried and the correlations known in the interests of 
interpretation; but no one of them is the criterion. For instance, 
the use of the 16 PF test has yielded a great many significant per- 
sonality factor correlations, for example, with success in school, 
prognostic rating in a clinic, automobile accident proneness, alco- 
holism, etc., and these have greatly enriched the original interpreta- 
tions based on factor content alone. 

One of the first inquiries to be made about the nature of a factor 
—indeed it should be the routine inquiry before making any more 
specific hypotheses—is to test whether it is largely hereditarily de- 
termined or substantially a product of learning and environment. 
Obviously, this is of basic importance both for theory and for the 
proper practical use of the measurement. Indeed, one of the chief 
claims of the factorally unitary measurement is that it permits some- 
thing more than merely statistical prediction—namely, an estimate 
of criterion performance that takes into account whatever general 
psychological knowledge about the natural history of a trait permits 
us additionally to infer. Fortunately, some fairly extensive nature- 
nuture studies have already placed the principal factors in perspec- 
tive, in relation to such older factors as Spearman’s “g.” For exam- 
ple, we know from multiple variance analysis studies that the cyclo- 
thyme-schizothyme factor is largely hereditarily determined, that a 
surgency-desurgency source trait is largely environmentally deter- 
mined, that the level of dominance-submission is about 50-50 a prod- 
uct of constitution and familial-environmental influences, and so 
on. 

Although the greater meaningfulness of personality measures 
based on factors arises from the possibility of building around each 
of these functional unities a rich natural history, the actual growth 
of such knowledge has barely begun, because of the extreme recency 
of satisfactory proof of the factors themselves. Meanwhile, the 
tests that can be and have been spawned with much greater ease 
have accumulated gargantuan standardizations, as well as the mo- 
mentum of enormous numbers of past students whose gifts seem 
to be exclusively in the rituals of administering them. Like the 
first small mammals entering a world possessed by the dinosaurs, 
re lately arriving factored tests, have a validation largely in the 
uture. 
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Dynamic CALCULATION 


The most recent, and as yet scarcely noticed, development of 
factored measures lies in that area of dynamic calculation which 
is so vital to clinical psychology and to motivation theory. This 
rests on the discovery that the drive patterns in man can be estab- 
lished by the factoring of collections of objective motivational 
measures. Sex, self assertion, fear, and six other drive patterns have 
been replicated now in three successive studies by these means. 
Alongside these easily recognizable drive patterns, there occur pat- 
terns that closer scrutiny suggests can only be acquired dynamic 
sentiments, such as the self sentiment, the sentiment to religion, 
and the sentiment pattern of attitudes and interests acquired about 
one’s profession, : 
ed as a stimulus-response habit. 
erest, that is, the need to react, 
n be expressed by a specification 
evels of the various drives and 
s measured in the given individ- 
t book on Personality and Motiva- 
for the postulates, and the chief 
namic calculus which develops on this 
s of the drive pattern—makes possible 
s. For example, it is found that the 
e resolved into three distinctive compo- 
ntiment structures. On the basis of such 
y distinct and replicatable ee 
gation of dynamic laws and motivationa 
more exactly and more subtly than before. 


FACTORING or Drives 


A development of crucial importance for clinical theories which 
these measurements have made 


chnique, their quantitative contribution to the 


perience with the case, and the 
antitatively in terms of the dynamic 
reasonably good, I think we shall have 


ct qu 
nt is 
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demonstrated a very powerful new clinical tool. Parenthetically, 
I may add that to the extent that the agreement turns out to be 
imperfect, one may reasonably have doubts whether the psychia- 
trist or the dynamic calculus is wrong. In fact, our first step if the 
agreement is inadequate will be to bring in a second psychiatrist 
to see how far he agrees with the first! 


New Types or TESTS 


The development of structured measurement in motivation has 
gone hand in hand with the invention of quite new types of objec- 
tive tests, no longer requiring the actual scores on component in- 
terests and attitudes to rest on the verbal opinionnaire or the open- 
ended, projective type of test. These objective motivation measures 
include some devices using the so-called projective principles, to- 
gether with physiological measures of motivation, learning measures, 
and many others, As far as theory is concerned, the interesting 
point of this analysis is that we seemed to get three distinct moti- 
vation strength factors, apparently corresponding to the id, ego, 
and superego contributions in any given interest. 


PRACTICAL IMPLICATIONS 


These theoretical implications will doubtless be much scrutinized 
and debated, but there are some immediately dependable conclu- 
a ig for the practical man. First, the classical opinionnaire method 
eee attitude-interest strength by verbal self-evaluation 
he: ‘ae poor validity, accounting for only about a fifth to a tenth 
inter eel oe in the main motivation factors, however they are 
Ka e consequently, generalizations about attitudes and in- 
ae base only on this instrument could be highly fallacious as 

as the total variance in interest strengths is concerned. Secondly, 
he tet dt tests, or misperception tests as we prefer to call them, 
Glee early distinguished by any factors from the rest of the 
Sines ion measurement devices. Thus in the theoretical recon- 
Be m suggested by this work, the classification of motivation 
e: ee would fall principally into id, ego, and superego 
projective. one and the division into projective and non- 
tivat ve, physiological and non-physiological, etc., signs of mo- 

lon strength become rather pointless. 
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MEASUREMENT OF STATES AND TRAITS 


Any comprehensive view of progress in personality assessment 
must include the measurement of states as well as the measurement 
of traits. The work of Scheier (11) on the measurement of anxiety 
provides, as I have briefly mentioned, a very neat methodological 
demonstration of conceptual and statistical problems involved in 
separating states and traits. Scheier has now checked the anxiety 
state pattern in two independent factor analytic studies, and the 
anxiety trait pattern in no fewer than eight independent researches. 
There is enough similarity in the state and trait patterns to justify 


the popular habit of using the term anxiety for both. Both load 


a particular set of markers in the questionnaire realm, in objective 
personality tests, and in 


physiological response measures, though 
the emphases are interestingly somewhat different. For aught the 
early scale makers knew, there might have been four or five distinct 
and uncorrelated factors of anxiety rather than a single factor. 
As it turns out there does seem to be a single factor of anxiety, 
but these premature scales mix this anxiety factor with the quite 
distinct neuroticism factor and a number of other irrelevant and 
contaminating factors, It is really not surprising that anyone suf- 
veying the literature of the past ten years is discouraged by almost 
every finding being matchable by an equal and opposite finding 
for even when investigators verbally defined anxiety in the same 
way, they frequently used a different test for it. It is rather early 
to see what the full impact of factor analytic work on anxiety meas- 
urement will be in giving a new momentum to insightful “clinical 
research. The instrument could permit the emergence of a whole 


series of new laws and therapeutic certainties, replacing the present 
gropings toward scales of obscure meaning, 


One of the certainties which emer 


e distinct factors. They have a slight obliquity; 
own with substantiating evidence, they can be 


a iti i dent 
of measured psychoticism. are, additionally, indepen 


These, however, are only local areas of illumination in the factor 
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analytic picture, rendered clearer by our clinical familiarity with 
the phenomena. Outside these brightly lit spots, in the domains of 
the remaining dozen or more personality factors, definitely locatable 
but uninterpreted, there exists obscurities and some intriguing para- 
doxes, now engrossing the pure researcher. For example, for the 
last four years it has been known that two substantial second order 
factors can be found among the primary personality factors as rep- 
resented in the 16 Personality Factor Questionnaire. 

The first of these second order factors brings together the sep- 
arate dimensions of ego weakness, high ergic tension, and the mys- 
terious O factor, sometimes called guilt proneness, and which we 
have so far hung on to mainly by the symbol O. The second of 
these massive second-order factors reveals the existence of a com- 
mon influence behind surgency, cyclothymia, dominance, and the 
factor which we have called parmia, which is short for high para- 
Sympathetic system dominance. These patterns were confirmed by 
the independent study of Karson, at the University of New Hamp- 
shire, and I think that we can now agree that the second of these 
two large factors gives substance to the Jungian concept of extra- 
fie oE as a definite, invariant second order factor, 
to be a as the mere correlation cluster which it was once thought 
en m is to say, the general personality dimension of extra- 
een pac expresses itself in five relatively independent primary 
en Soke dominance, parmia, cyclothymia, and lack of self- 
Br y. The quality of an individual’s extraversion therefore 

T to be defined by his separate scores on these five components. 

Ken the nn massive second order factor thus quickly 
ee ae ane ong popularly discussed, the first large pattern 
could n E ar yY ergic tension, etc., as I have described, 
gan his work = ‘is 4 interpreted. However, when Scheier be- 
1S Basanti we 10 jective anxiety measurements, he included the 
termined del a Questionnaire in his study and when he de- 
ee nac ing of his objective test anxiety factor on these 

aia a measurements, the pattern of loadings turned out to 
tomate is er men as that found in the factorization of the ques- 
or . In ae words, the second order factor among the 
ren Ben i entical with the first order factor among the 
ture thts be a a emeni On looking at the psychological pic- 
is Sba oes oa good sense for it tells us that high anxiety 
Ly ion. Bean, A i hence by high ergic tension, that is 
guilt Bethiene m of drive expression, and by the temperamental 

ess component. 
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Without time for expanding these comments, I would point out 
that we now have three instances where a second order factor in 
the questionnaire realm has become recognized and confirmed as 
a first order factor in objective, instrumental tests. The perplexing 
lack of relationship between the questionnaire and behavior rating 
factors on the one hand, which mutually agree well, and the ob- 
jective test factors on the other, which have previously defied align- 
ment, therefore begins to resolve itself. The objective test factors 
are second order factors to the primaries found in the other media 
of observation. This is only one illustration of the increasing inter- 
connection and illumination of structure which is now beginning to 
take place in the factor analytic realm. However, I want to add a 
technical word of warning. I think this fitting together of the jigsaw 
puzzle can continue only insofar as we all give far more attention 
to good technical precision in our first order factor analyses than 
has been typical of work in the last ten years. In particular, far 
greater diligence is necessary in getting accurate rotation, to a 
plateau of maximum percentage of variables in the hyperplane, 
whenever simple structure is alleged to be obtained 


AVOIDANCE OF GENERAL THEORY 


If I have referred insufficiently to general theories, it is because I 
believe that Psychology particularly needs to guard itself at this 
stage of development from getting into cloudy regions of grandiose 
theory, instead of seeking well established laws and concepts, sus- 
ceptible to accurate measurement. In a healthy science, wider 
theories arise from well determined regularities, which we call laws. 
thought on this, I think you 
» dependable laws in psychology 
gers, not in the hundreds with 
e physical sciences. This is both 
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terms of getting nowhere in a great hurry. Several shrewd observers 
have pointed out one feature that constantly seems to distinguish 
research in the social sciences from research in the physical sciences: 
the physical sciences typically show an architectonic growth, in 
terms of one research building constructively upon another, where- 
as in the social sciences there are an enormous number of unre- 
peated researches, in which particular variables are used by a 
particular investigator and never touched again by anyone. The 
resulting scenery is a shanty town of one story hovels instead of 
the skyscrapers which the physical sciences build. 

I think there are three major, and doubtless many minor, reasons 
for this, First, we have tried to ape the physical sciences by con- 
centrating on the univariate controlled experimental method, in- 
stead of the multivariate experimental method which is alone truly 
adapted to the far more numerous variables and complex determi- 
nation with which we deal. Second, our work needs far more 
mathematical discipline than our students have been willing to 
acquire. Third, there has been insufficient social organization of 
research. By this last I mean that we have been inclined to ascribe 
our failures wholly to defective technical methods when frequently 
they are due to defective coordination of research. Better social 
Organization can come either from the organization of teams and 
institutions or through more sensitive conscience and vision in the 
individual research worker. One of the immature features of our 
Science seems to be a bizarre teenager sense of honor, which dictates 
that no individual with claims to creativity could possibly use the 
Same variables as any other individual and certainly not stop to 
replicate any extensive experiments previously done. There is also 
what I would call magpie research in which the investigator seems 
attracted for purely emotional reasons by the glitter of a particular 
variable or piece of apparatus, e.g., the psychogalvanometer, social 
Prejudice, colored inkblots, sociometrie count, or what have you, 
and centers his research on a mere variable without any broader 
theoretical or conceptual framework. 


SOLUTION: ORGANIZATION AND Division oF LABOR 


2 believe a great deal of progress could be made by a very simple 
Practice indeed, namely, that of putting other peoples’ variables 
“to one’s correlation matrix. People can continue to hold quite 
different theories about what is happening behind these variables, 
but at least if we linked hands on some marker variables we could 
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with comparative certainty begin to relate and debate the asin 
through the intercorrelation matrices. We do this as a routine pi 
cedure in our own factor analyses, taking a minimum of two mar e 
variables for each well known factor, for example, from the = 
of Eysenck, Guilford, and other previous researches, when we = 
new investigation on the next factor of theoretical interest. Fac o 
analyses carried out without such markers from the known terra 
firma are strictly uninterpretable. They inhabit a solipsistic ee 
of their own, with no past and very little future, and might as = 
be carried out upon the moon. On the other hand, an overlap 
variables, must sooner or later, mean an overlap of integration o 
ideas. If people want to be productive, they should get their vari- 
ables together. i 
This brings me to consider an important respect in which the de- 
velopment of our own personality assessment researches may be 
considered to lack integration. The charge must be admitted that 
factor analysts are so engrossed in establishing the form and nature 
of factors, with statistical elegance, in laboratory measures, that they 
have made quite inadequate effort to show the clinician, the edu- 
cator, the industrial psychologist, and others what these factors 
mean in more popular terms, and particularly to interpret them in 
terms with which the general psychological theorist is familiar. But 
let us not mistake the principle of division of labor, which is neces- 
sary in a highly specialized world, for any lack of integration, which 
is not. It happens that a rather unusual assemblage of skills, ap- 
paratus, and organized facilities is necessary for the effective ad- 


vance of knowledge through applying factor analysis to establishing 
functional unities in behavior, 


One needs, first, resear 
permit lengthy measuren 
a research team with t 
general theoretical con 
situational objective t 
statistical issues in th 


ch time, resources and subjects enough to 
nent of a large range of variables; second, 
alents in the direction of proceeding from 
cepts about personality to actual miniature- 


est designs; third, a sure touch in the finer 
e area of multiv 


trained, is far from co: 
rare; so it is not surprising that there 
laboratories in the English-spe 
of outside it, where this basic r 

However, although such re 


nditions are positively 
are fewer than half a dozen 
aking world, and none that I know 
esearch is being intensively pursued. 
search centers cannot easily be ex- 
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panded to the requisite number, there is no need for them to be so 
few as they are. Any large university psychology department should 
be able to organize an effective laboratory in this area. Viewed in 
broader terms of national effort there are unmistakable similarities 
to our backwardness in the area of intercontinental ballistic mis- 
siles. Both have the pattern of insufficient planning of funds and 
facilities for the scale of work required, and the lack of ability to 
bring representatives of different departments together. In our case 
the coordination failure has shown itself especially, until recently, 
in obtaining strong teams combining clinicians and multivariate sta- 
tistical experimenters. 

The second necessary objective in the organization of research 
is the expediting of external cultural validation of these functional 
unities, once they are established and have had good tests set up 
for them in the laboratory. I believe it cannot be too much stressed 
that this is a task which cannot effectively be done by the same 
team or organization as had been designed for the basic internal 
validation just described. Instead this is the proper field for the 
vaster group of professional, applied psychologists in clinical, edu- 
cational, and industrial research. There is always a lag between 
the conclusion of laboratory research and its use in the field, and 
one wonders if this lag could not, with a little better cooperation, 
be cut down from ten to five years. We all know the theory that 
if a man in the backwoods invents a better mousetrap the world 
will, in a few days, make a beaten track to his door, but in an age 
of advertising and vested interests he is more likely to be paid to 
bury the invention. 

Through the momentum of custom alone, and the ego involve- 
ments of personal prowess with ink blots or Binet, the majority of 
clinical and educational psychologists are inclined to continue with 
the instruments they were taught to use at college, though instru- 
ments of twice as high a validity may be open to evaluation in re- 
search reports. For example, many clinicians are only just beginning 
to realize that the factored questionnaire of today is something quite 
different from the ad hoc questionnaires of former years, and there 
18 quite a good probability that it would give them better clinical 

iagnoses and prognoses than are obtainable from their current 
tools. Others, failing to realize the modern demands for research 
Specialization which I have just stressed, seem to expect that the 
factor analysts will not only investigate structure but also supply 
them with the clinical validities of such tests, and they sit back 
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and wait, ignoring their own vital role in test development. nn 
the test construction itself is today a full time and highly specialize 

task. The amount of planning, skill, and labor involved eerie 
izing literally hundreds of variables, checking by replication the 
factor structure in independent studies, and constructing unifactor 
scales from such variables is enormously greater than that involved 
in the older style questionnaires and tests, which most of us could 
make up almost overnight. It is, however, true that the factor 


analyst has usually been content to dump his finished product before 
the clinician in the journals and to r 


eturn to his computer and his 

laboratory. l 

Unfortunately, even the applied psychologist who realizes his 
role in th 


e teamwork of science, has been inclined to look at this 
abstract contrivance with about as much enthusiasm and insight 
as a Bikini native looking at an atom bomb. He rightly fears that 
it is something which will involve radica] changes in his mode of 
practice and thinking. Often he is inclined to defend himself from 
having to think in objective structural concepts by saying that a 
factor is an artificial mathematical monstrosity which will have no 
potency in his human clinical world. The result is that though a 
number of well factored tests highly relevant to clinical practice 
have become available over about the last five years, the activity 
which should have led to their external validation has been utterly 
inadequate. The important point, however, is that on the few oc- 


casions when their external validity has been crucially tried, it has 
turned out to be very good. 


Turning from 
ternal validities 
factors and st 
in the laborat 


practice to basic theory, one notes that these ex- 
are vital to the full ir 


Tuctural relations for 
ory alone. In the cas 


leading to great stride 
only five years ago ha 
the alphabet—just like 


versus ego weakness, this hypothesis only received the degree of 
confirmation i i 


face groups, that it i 
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among students of the same intelligence level, that it is negatively 
correlated with accident proneness, that it is substantially negatively 
correlated with anxiety proneness, and so on. 

Similarly, the finding that high F factor or surgency-vs-desur- 
gency, is substantially positively correlated with being chosen and 
voted a group leader, that it has one of the principal loadings in the 
second order extraversion factor, that it increases with alcohol, that 
it declines steadily with age from adolescence to middle age, that 
its level is largely a product of environment rather than heredity, 
that it increases significantly under frontal lobotomy and under 
psychotherapy, provided valuable extension of the original factor 
hypothesis that desurgency is a form of generalized inhibition, as- 
sociated with frontal lobe action and with frequency of punishing, 
repressive past experience. This degree of insight into its nature 
could never have been achieved from the direct content of the 
factor, either in ratings or in the questionnaire responses. 

Accordingly, the great need in the social organization of research 
at the present moment is a concerted plan for taking all factor 
analytically well established personality source traits and having 
their social validities, their changes with age, their relevance to 
clinical prognoses, their educational predictive value, etc., system- 
atically examined. No one clinical, counseling, or other applied 
psychological center can hope to do this alone or for all the factors. 
But a planned division of labor, in which certain laboratories or 
clinical centers make systematic studies of the life history of one 
factor and others of another could lead to an enormous increase in 
the practical effectiveness of personality measurement in applied 
Psychology in the next five or ten years. 

In conclusion, I hope I have given some convincing reasons why 
the construction of personality measurement scales should be 
wedded to concepts of personality structure, and some evidence that 
the objective structuring of personality has come of age sufficiently 
to make this possible. How soon this marriage will be fruitful, in 
terms of major gains in the power and insightfulness of applied 
Psychology, depends on how soon teachers of applied psychology 
Cease thinking in terms of catalogues of tests and set out to teach 
tests and measurements as an epilogue to courses in personality. 
The psychology of structure and growth comes first: the tests are 
merely an appendix to such an exposition. If after all this discussion, 
you were to ask me why I personally prefer factor scales to other 
Scales, e.g., simple homogeneous scales, I think I should have to 
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say because the former are psychologically interesting and the latter 
are dull. When you are through with a complicated scaling ritual 
you have perhaps at best eased a neurotic compulsion; but with 
factor scales you can have a lot of fun finding how people tick. 
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Differential Validity in Some 
Pattern Analytic Methods 


Louis L. McQurrry 
Michigan State University 


A THEORY of personality structure 
is a starting point for the development of numerical methods 
the objective assessment of personality. This chapter starts wit 1 A 
simple-minded theory concerning the way in which personality n 
structured. It outlines the theory and traces the development © 
a series of pattern analytic methods that have derived logically from 


the theory. The methods can be used to investigate the fruitfulness 
of the theory, 


Most clinical theories acce 
toms. A syndrome of sympt 
that implies a disease of som 
the manifestation of this c 
as a syndrome of behavior: 


pt the concept of syndromes of symp- 
oms is a combination of characteristics 
e kind. Ifa person is mentally ill, then 
ondition can presumably be describe 

al symptoms. For every disease nad 
there is a unique syndrome of symptoms. There is presumed to be 
a one-to-one correspondence between disease entities on the one 
hand and syndromes of Symptoms on the other. The process O 
diagnosis is to discover the syndrome of symptoms portrayed by a 


patient and then assign to him the disease corresponding to that 
syndrome, 


An examination of syndromes of symptoms reveals that many 
symptoms are common to more than one syndrome. There is not 4 


one-to-one correspondence between symptoms and syndromes. In 
66 
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other words, there is not one symptom that is unique to each syn- 
drome, such that if a given symptom is present then a corresponding 
syndrome is known to be present. Nearly every symptom can and 
does occur in more than one syndrome. Analogously, most every 
symptom can and does occur as a manifestation of more than one 
disease; there is not a one-to-one correspondence between symptoms 
and diseases. 

In the field of mental health, symptoms are characteristic re- 
sponses, and syndromes of them are patterns of characteristic re- 
sponses. Following this translation still further, mental diseases 
are personality types. This approach gives a particular definition 
to the concept of a personality type. The personality type is the 
internal property that causes a person to portray a particular pattern 
of responses, It is a hypothetical contruct; it is assumed in order to 
explain why an individual gives a particular pattern of responses, 
just as a disease is interpreted to be the cause of a particular syn- 
drome of symptoms even when nothing more than the syndrome 
has been observed in the patient. 

There is presumed to be a one-to-one correspondence between 
personality types and patterns of responses, such that if a given 
pattern of responses is characteristic of a person, it means that the 
individual possesses a particular personality type. There is not, 
Pe presumed to be a one-to-one correspondence between 
: ividual responses and personality types. Rather, most individual 
€sponses are presumed to be characteristic of more than one type; 
sont types can cause the same response. As a consequence, 
ieee ri om sometimes means one personality type and some- 

z anot 1er type, just asa given symptom sometimes means one 

ease and at other times a quite different disease. 

PMc go the analogy of personality types to disease entities still 
Bade . is helpful to realize that a person can have more than one 
Rue at a time. Analogously, it is assumed that a person can be 
m by more than one personality type. In fact, it is 
he that he can be characterized by many personality types. 
number of types desirable to attribute to a person will depend 


on the level of abstraction that we wish to achieve in classifying 
People, 


THE CLASSIFICATION PROBLEM 


és one problem with which we are concerned is the development 
sa Terica] methods that can start with the symptoms character- 
of patients and that can be used to classify the patients ob- 
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jectively into meaningful disease categories. An analogous problem 
is to start with responses to the individual items of a test, using 
these to classify the subjects into meaningful personality types. 

The classification problem is complicated by the lack of a one-to- 
one correspondence betwen responses and personality types. In yn 
person, the response to a given test item may be determined by 
one personality type but in another person the same response ‘a 
the item may result from a different personality type. For example, 
a medical type of person may respond correctly to a question about 
chemistry because he has learned it in the advanced study of medi- 
cine. A chemical engineering type of person, on the other hand, er 
respond correctly to the same item because he learned it in we 
advanced study of engineering, but the two types of persons wou 
possess this identical knowledge in different patterns of other in- 
formation about physiology and mathematics. 

The fact that the correct answer to an item is caused by two 
different personality types means that it has differential validity . In 
the one case, it indicates a medical type, and in the other it indicates 
an engineering type. In order to know which of these two types as 
indicated by a correct answer to the chemical item, we must s 
the answers to other items. If the correct answer to the chemica 
item occurs in a pattern of correct answers about physiology and in- 


correct ones about mathematics we then know that the correct 
chemical answer indicates a medical rather than an engineering 
type. 


DIFFERENTIATION Versus Discovery oF TYPES 


Because the chemical answer has high validity for indicating 
both engineering and medical types, it necessarily has low dis- 
crimination for differentiating between the two types; it would be 
discarded as an item in a test designed to differentiate between the 
two types. Instead we would use the mathematical and physio- 


logical items in differentiating between engineers and doctors. But 
we are concerned h 


Ten ere with a more difficult problem in objets 
classification, In the example, we assumed that the engineering an 
medical types are given; we know who are doctors and who are 
engineers, 


( Instead of starting with known categories of subjects; 
we wish to start with characteristi 
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variantly valid in the sense that they measure the same thing or 
things for all persons. This latter kind of item, with invariant 
validity, is the kind desired in most test construction methods, as 
illustrated especially well in factor analysis. In factor analysis, an 
item is assumed to measure the same thing for all people. This is 
not to say that it measures only one factor; it may in fact assess 
several factors, as indicated by loadings on several factors. The 
point is that whatever an item measures it is assumed to do this 
with near equal efficiency for all people of the universe under study. 
In this sense, its degree of validity is invariant; it measures nearly 
equally well the same stuff for all people. 

_ Items of invariant validity are not the ones with which to start 
in an effort to isolate types, when the types are to be determined 
by the responses to test items themselves. The reason for this is 
that we have assumed there is not a one-to-one relationship between 
types and responses. There is no response which is known to mean 
One type and only one type. 

Not being able to use items of high invariant validity, we seek 
then the next best thing, viz., items of high differential validity. 
Responses to these items manifest types but they manifest different 
types in different people; this is the sense in which they have 
differential validity, Even though the items by themselves have 
differential validity, patterns of responses to them are presumed 
to have invariant validity; each pattern of responses to these items 
is presumed to mean one and only one personality type, thus 
eg a one-to-one correspondence between types and pat- 

s. 

Or purpose is to isolate response patterns which have invariant 
io ey with respect to types. In order to do this we first attempt 
= er ne a type. In this effort, there are alternative ways of pro- 
en = oe could, for example, attempt to define types in such a 
in i hat we could recognize their manifestations observationally 
È people. We could then observe people and select representative 
pes We could study them and attempt to write items to which 
ob ypes would give differential patterns of answers. However, the 

Dservational isolation of types has not proved particularly fruitful, 
Istorically, 
tp oe u used a different initial approach in our efforts to study 
= a ventually we will wish to combine the two approaches, 
ip = ei in mutually assistant fashions. Our approach is to give 
4 m stical-like definition to both patterns of responses and per- 
ality types. We use the statistical definition to enable us to 
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i i i z pes. 
develop the techniques of analysis for isolating patterns or 7 > 
; both the response patterns an 

Then we propose to study en 

in or: car about the char: 
tatives of the types in order to learn more , i 

lies of the types. Dur first patterns will doubtlessly be peo 

and overlapping but nevertheless subject to refinement, ela ne 
and improvement through repeated application of the me 

used. ne 

We first assume that we have a test that contains items = 

differential validity. Considerable time should be ig to Bir 

selection of such items in terms typological theories. ls 
yet addressed ourselves to this problem, being concerned ini 


rical 
with the statistical definition of types and the methods of numeri 
analysis for isolating both patterns and types. 


Some PATTERN-AnaLyrıc METHODS 


Since several pattern-analytic methods have already Te a 
veloped, it will be helpful to review two of them in relation 3s 5 
set of assumptions before outlining our statistical definitior 
types and the methods of analysis which flow from them. -iate to 

Two major kinds of pattern-analytic methods are appropria as 
two different classes of data, ordered and unordered, The i A 
to the individual items of a test are an illustration of unordered ee 
at least until the responses have been allocated to a scale accordi 


: ; ses 
to some operational definition. After allocation to a scale, respon 
to the items then illustrate ordered data. 


Profile Analysis 


One general pattern-anal 
items are first ordered to a s 


allocate people to the scale 
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from those of B, except for Scale 8 on which the subjects of both 
A and B all have the same standing as shown in Figure 1. 

If we assume that the profiles of categories A and B are manifesta- 
tions of Types I and II respectively, the common standing on Scale 
8 has two different meanings. In profile A, it means Type I, but in 
profile B, it means Type II. This result shows that the standing has 
differential validity. The items of a scale, however, are usually 
chosen to minimize differential validity. In building a scale, we 


B 


Scale I A 


Scale 2 
Scale 3 


Scale 4 


Scole 5 


——— Profiles for individuals of Type I in Category A 
== == Profiles for individuals of Type IL in Category B 


Ficure 1. Hypothetical Profiles Illustrating Differential Validity for the 
Common Standing on Scale 3. 


usually attempt to define a unitary trait that is common to all people, 
and then we attempt to select items that measure this trait in all of 
our subjects; we attempt thereby to select items with high invariant 
validity, but items must have relatively low differential validity 
to the extent that they have high invariant validity; if an item meas- 
ures one thing well for all subjects, as required by invariant validity, 
A can not then measure more than one thing as required by differ- 
aial validity. Efforts to select items with invariant validity in 
pading scales necessarily limits the potentiality of differential va- 
idity, If differential validity is nevertheless found with such scales, 
wo result suggests the worthwhileness of searching for items with 
nigh differential validity and analyzing them in a manner particu- 
anly designed to discover the manifestation of types. 
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Lubin’s Approach 


Not all pattern-analytic methods which have been applied to 
unordered data, viz., the responses to the individual items of a test, 
have been applied in a manner that selects items with high differ- 
ential validity. Some pattern-analytic approaches have been applied 
instead in ways which maximize invariant validity and thereby tend 
to minimize differential validity. An example is a study by Lubin 
(2) in which he used what I have called an accumulative method of 
pattern analysis (4). 

Lubin’s approach involved selecting items in relation to an ex- 
ternal criterion for a pattern-analytic method of scoring. He first 
selected from a group of many items the one item x that was most 
highly related to the external criterion. Next he treated this item 
successively in a pair with every other item until he found the one 
pair of items x and y that had the highest pattern score with the 
criterion. Proceeding in an analogous fashion, he retained items 
x and y and tried them successively with every other item until he 
found the triplet x, y, and z that had the highest pattern score with 
the external criterion. Thus he selected his items accumulatively 
one item at a time. By selecting the one item with the highest re- 
lationship to the criterion, however, he selected an item with high 
invariant validity. This action not only limited the differential 
validity of the item selected but also the interaction variance it 
can have with other items and consequently the differential validity 
of the next item selected. 

In the Lubin approach, the first pair of items must not only 
include the first item (with high invariant validity), but the pair 
itself must have high invariant validity with the criterion, thereby 
limiting the interaction variance that later items can have with those 
already selected. This whole process continues to limit interaction 
variance. Since interaction variance is the essence of differential 
validity, differential validity is continually limited throughout the 
selection process, 

This outline of influence on validity of the Lubin method of 
selecting items for pattern-analytic scoring is not to argue that the 

have been performed. On the contrary, 
an effective use of the extreme case. 
f items of this kind had proved more 
nethods of selection and scoring, it would 
rsement of all items for pattern-analytic 
did not find his approach to be superior 


DIFFERENTIAL VALIDITY IN ANALYTIC METHODS 73 


to the usual methods. We only have evidence that items with 
relatively high invariant validity do not yield unusually promising 
pattern-analytic scores, leaving the possibility that items may still 
be found with differential validity and promising pattern-analytic 
Scoring. 


METHODS FOR SELECTING ITEMS WITH DIFFERENTIAL VALIDITY 


In our own approaches the methods are appropriate to the selec- 
tion of items with high differential validity, where differential va- 
lidity is defined to mean that an item response is determined by 
different internal constructs in different people. The occurrence of 
the same item response in two different patterns of responses to 
other items is assumed to be tentative evidence that an item re- 
sponse is determined by two different constructs. Thus, if two sub- 
jects both answer a difficult chemical question correctly but the 
first subject also answers many physiological questions correctly 
while failing mathematical items, and the other subject answers 
the mathematical questions correctly while failing the physiological 
items, we might say that the first subject answered the chemical 
questions correctly because he was a medical type, and the second 
subject answered correctly because he was an engineering type. 
We would thus be treating the concept of “type” as a postulated, 
Internal construct, attributing to it the power of determining pat- 
terns of answers to items. 

By seeking item responses with different validity as evidenced by 
their occurrence in different response patterns, we are in fact seek- 
Ing items with high interaction variance. We want item responses 
which have various meanings depending on the combination of 
other item responses with which they occur. : 

In developing our approaches, we have assumed that typological 
theories are relatively inadequate; we suspect that they do not now 
describe the types that will ultimately prove most fruitful in ob- 
Jective, numerical analyses. Not knowing the nature of types to 
hypothesize, we have tried to develop methods that would depend 
maximally on the concatenation in the data and minimally on 
assumptions implicit in the method of analysis. 

_ We have not known the level of abstraction to apply in the isola- 
tion of types. Consequently, we have developed methods for a 
terarchical classification of response patterns. At the lowest level 
of classification, there is very little abstraction; the subjects are 
Classified into many categories, and every category contains few 
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subjects, all of whom have relatively many common oo © 
the classification proceeds to successively higher and higher le = 
of classification, there is more abstracting; there are fewer catego = 
of subjects and every category contains more subjects who am 
fewer common responses. In other words, at the lower = c 

classifications, relatively unique characteristics are ena og in 
determining the many types. As the classification proceeds, as 
relatively unique characteristics are disregarded in favor of = 

general ones which are descriptive of larger categories of people. 
Thus in the course of the analyses we proceed from the unique 
individual to relatively unique types; then to more and more gener- 
alized types, until at the top level of classification we may have m 
the extreme case only one type of person and all members with 
only a few common responses. This approach makes it possible to 
compare the types of the successive levels in terms of such con- 
siderations as meaningfulness, statistical significance, reliability, va- 
lidity, and the prediction of criteria, thereby providing insight into 


which types might most fruitfully be regarded as in some sense 
“real,” 


DEFINITIONS OF TYPES 

By the approach just de 
maximal influence to the d 
“real” types. N evertheless, 
cedures for the isolation of 


to the concept of type so tha 
from the definition. 


scribed, we have attempted to give 
ata in determining what is to constitute 
in developing objective numerical pP 
types, we must give sufficient ee. 
t statistical operations will flow logically 


Binary Types 
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ra kan ni level. The method which derives from these 
ie a ypes is called binary agreement analysis (6). 

Mari eg ona classification by twos in the definition of types 

Serto Kam: r n a typological theory; it was done, instead, in 

velopinient p y the statistical operations and thus facilitate the de- 

aona o numerical method for classifying people, realizing 

= approximate method might enhance the development of a 
re sophisticated solution. 


Class 


Orders 


Families 


e= E 


Individual Organisms 


Ficure 2. A Simplified and Incomplete Linnaean Chart. 


ee ing and applying binary agreement analysis a particu- 
analyze Te stood out as a result of features within all sets of data 
or exam; en highest agreement scores were reciprocal. Consider, 
In thi pie, the matrix of Table 1. 
and B ; matrix subject A has his highest ag 
agreement turn, has his highest score with 
is ant score in the sense that A is not 0 
Henna Ea with A. Some highest agreement 
and J; Tt illustrated in the one mediating betw 
The See with J, but J is highest with K. 
entire] nition of types as used in binary agreement analysis was 
Ciprocal pepe ; so long as highest agreement scores were re- 
> but when they were non-reciprocal, a problem arose. In 


reement with subject B, 
A; this is a reciprocal 
nly highest with B but 
scores are non- 
een individuals 
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xample, if I were classified with J because it is highest 
ae cold not be classified with K with which it is Fe 
i.e., so long as we continued to classify in pairs exclusively. Rn 
binary agreement analysis, we solved this problem arbitrarily. A e 
classified J with K rather than J with I if and only if the score or 
J-K were larger than the one for I-J. Individual I was then c as- 
sified in terms of its second highest agreement score, at the species 
level, i.e., with L in the matrix of Table 1. In cases such as this, 
species J-K and I-L usually come together at the genus level to 


solve the problem in a more meaningful manner at the second level 
of classification. 


Generalized Agreement Analysis 


The joint occurrence of reciprocal and non-reciprocal scores led 
naturally to the concept of multiple reciprocity where J, J, K, and L, 
for example, might all be highest, each with respect to every other 
one; the scores would all be tied and they would be larger than any 
score which any one of the individuals would have with any other 
individual. However, empirical data do not generally reveal this 
condition, yet theories of types argue for it. We assumed that the 
failure of empirical data to reveal the conditions is due to chance 


error in raw data; we therefore developed a method that includes 
a technique for correcting agreement sc 


ores for chance errors, called 
generalized agreement analysis (3) 
Once we had introduced th 


concept of corrected agreement 
P 


TABLE 1 


AGREEMENT SCORES BETWEEN INDIVIDUALS HYPOTHETICAL DATA 


Al BIC ine rl elal r IıK| 4 
A 115) 110 | 105| 100| 109| 104| 88| 44| 40| 35 ps 
B | 115 109 | 104) 99| 110| 105; s9| 43| 39| 34| 2 
C | 110} 109 103) 98| 102| 102! 85| 42| 38| 33] 28 
D| 105| 104| 103 99| 101| 103| s4| 40| 837| 34] 29 
E | 100| 99| 98| 99 100| 101| 84| 40| 838| 32| 30 
F | 104| 110 | 102 | 101| 100 103| 83| 41| 36| 33| 28 
G | 104| 105 | 102| 103| 101| 103 99| 38| 837| 31| 29 
H| 88| 89| 84| 84| 85| 83| 99 57| 58| 54| 53 
I) 44) 43| 42) 40| 40| 41| 38| 57 108 | 106 | 107 
J| 4) 39) 33] 37| 38] 38) 37| 58| 108 110 | 106 
Kj 35| 34] 33) 34| 82) 33] 31| 54. 106l 270 108 
L| 80| 2| 28| 29| 30| 28| 29| 58| 107| 106| 108 
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scores we were able to use a more realistic definition of types in 
the development of statistical methods for isolating them. We de- 
fined a species as a category of subjects of such a nature that every- 
One in a category is more like what is common to members of the 
category than he is like any one in any other category. In this 
approach, a species of two often has a larger corrected agreement 
score when it grows into one of three, because of the greater 
dependability of agreements by three individuals over those in 
two. This fact enables individual I of Table 1 to have a higher 
Corrected score with J-K than it does with L; thus requiring that I 
be classified initially in a category with J-K with which it has its 
highest corrected agreement score. 


Elementary Linkage Analysis 
i A criticism of generalized agreement analysis is that it is labo- 
nous. However, the fact that it works enables us to propose a 
Simple definition of types which can be applied to a matrix of 
Scores to yield a rapid, numerical method for isolating types. 
The method is called elementary linkage analysis (5). A type is 
efined as a category of people of such a nature that everyone in 
8 category is more like someone else in the category than he is like 
anyone in any other category. This definition is applied easily in 
€ assifying people into categories once one has a matrix of agree- 


or 


i 


<— 


Ficunre 8. Type I. 


ment score (or some other index of likeness) between people. Con- 
Sider the matrix of Table 1 again. The first step is to underline the 
lighest entry in each column, as has already been done in the cae 
e € then select the highest entry in the entire matrix. In “a ee 
„pls it is 115 and mediates between individuals A ana. B: ai 
a individuals are shown in Figure 3 with a double at pai R 
"8 between them to indicate that they are a rt Pa a stë 
is he highest entry in a matrix is always reciprocal. + ee 
to find all individuals who have either A or B most a ioe 
'S is done by reading across the rows of individuals A an 
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Table 1, and thus finding that C, D, and E have A most like them; 
they are classified in Figure 3 with A. F, and G having B most like 
them are classified with B. These new additions to the type, indi- 
viduals C through G are called first cousins because they join 
directly to a member of the reciprocal pair. We then examine the 
rows of the first cousins to see if anyone has these individuals most 
like them; we find that H has G most like it. No other individual 
has a first cousin most like it, and consequently no other first cousin 
brings another individual into the type. Row H of Table 1 is ex- 
amined and it is found that there is no individual who has second 
cousin H most like it. Consequently, we have exhausted the first 
type; no other individuals classify into this type. Individuals A 
through H have now all been withdrawn from the matrix of Table 
1 to constitute the first type. The method is then repeated with the 
reduced matrix to isolate the next type, etc., until all individuals 
are classified. The one reciprocal pair in the reduced matrix me- 
diates between individuals J-K which are joined by first cousins I 
and L to complete the second type as shown in Figure 4 and exhaust 
the matrix. In this approach everyone is classified into a category 
so that he is more like some person in the category than he is like 
anyone in any other category. 


I— J SSK «— L 


Ficure 4. Type II. 


When everyone has been classified into a category, the types may 
be called species, arbitrarily. One can then take an index of as- 
sociation between the species and repeat the process to classify the 


species into genera and analogously for higher levels of classifica- 
tion. 


Successive Agreement Analysis 


A defect in elementary linkage analysis is the fact that the initial 
classification for every individual depends on indices of association 
between pairs of people; these are subject to error, and mistakes 
ae in classification at the first level might be reflected at later 
evels. 

After having developed elementary linkage analysis, our purpose 
was to develop a rapid method which could use large sets of data 
and classify in depth with maximal validity. An electronic computer 
method was developed for this purpose. A type is here defined as 
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a category of people of such a nature that everyone in a category 
is more like what is common to the members of his category than 
he is like what is common to the members of any other category 
of the same size. It is called successive agreement analysis to 
indicate that an individual is first classified with the one person 
with whom he has most in common, then with the two persons 
with whom he has most in common, then with the three persons 
he has most in common, ete. 

Successive agreement analysis as practiced is not as comprehen- 
sive as the definition just given would seem to imply. Not everyone 
1s necessarily classified with the one person with whom he has most 
In common; it is done, instead, only for those who have the highest 
agreement scores; these are the most dependable classifications. 
Likewise, not everyone is necessarily classified with the pair of per- 
sons with whom he has the most in common; only those that have the 
highest agreement scores with the most dependable pairs are thus 
classified. A similar failure to classify everyone at the quadrad and 
even the quintad level may occur, but eventually everyone is 
usually best classified at some rather early level and then usually 
at all subsequent levels, 

Successive agreement analysis starts with a matrix of agreement 
Scores such as shown in Table 1. Then the N highest agreement 
Scores are selected out, where N is some arbitrary number, usually 
maximal for computer capacity, and often equal to the number of 
subjects represented in the matrix. In some cases, every individual 
will be represented in at least one of the N highest agreement scores, 
but in other cases some one or more subjects will be omitted while 
others are represented several times. Nevertheless, these N highest 
diadic scores are the ones with which individuals are apt to have 
their highest agreement at the triadic level. The next step is to 
Compute the agreement score between every individual and every 
pair to produce a matrix of triadic scores as shown in Table 2. The 
N highest triadic scores are then selected out. 

Usually these will include the classification of some individuals 
who were omitted in the classification at the diadic level. The proc- 
€ss proceeds in an analogous fashion at successive higher and higher 
€vels representing classification in depth and allowing an investi- 
gator to make comparative studies of the successive levels in terms 
of such considerations as predictive abilities, reproducibility and 
Psychological meaningfulness. 

It is essential to emphasize one characteristic of successive agree- 
Ment analysis, for it depends on an assumption which can lead to 
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errors if it does not hold for the data under analysis. Reference to 
Table 2 will help explain the assumption. In this table, individual 
L is not included in the highest N pairs listed in the left hand 
column. It is assumed first that N will eventually join a category 
which contains one of these N pairs. Suppose that this occurs at 
the quadradic level. It is assumed that individual L will have a 
higher score in this quadrad than he would have in any other 
quadrad if we had started with any other N pairs. A similar as- 
sumption is made for all individuals. In general, the assumption 


says that the N highest pairs are sufficient in order to realize the best 
classifications later on. 


SoME FINDINGS 


Some of these methods have been tried sufficiently to suggest 
hypotheses for further study. A first suggestion is that there are 
items whose responses have differential validity, measure different 
things in different people. When the value of these items is assessed 
in terms of invariant validity, they are found to have rather low 
validity. It is possible that nearly all items may have at least some 
differential validity. We may have in general underestimated the 


validity of all items by attempting to do this in terms of invariant 
validity exclusively. 


TABLE 2 


AGREEMENT SCORES BETWEEN INDIVIDUALS AND PAIRS OF INDIVIDUALS 
HYPOTHETICAL Data 
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The x's exist where the score would mediate between an individual and a pair 
containing that individual; the table is restricted to scores for groups of three 


different individuals. 
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When we have compared our types with external criteria, we have 
found those at the lower levels such as species and genera to be 
more predictive than the higher level ones, where the categories 
might be expected to be more reliably determined. This outcome 
suggests that there are many relatively unique types and that large 
samples of subjects are essential in order to isolate them in a de- 
pendable fashion. We thus need pattern-analytic methods which 
can classify several hundred subjects. Electronic computers with 
high storage capacity make this goal possible of realization. 
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ality and i ; T CONTENT OF objective person- 
valla Inn ' interest test items customarily has been selected with face 
inquire in mind. Personality inventories typically use items which 
ime wi u ar how one feels about being in crowds or whether 
Er en has nightmares. Similarly, interest tests use content which 
Fe on one’s enjoyment of mechanical puzzles, building bird- 
wh istening to music, etc. Rarely does one find an item dealing 
tonba a wie of nightmares on a vocational interest test nor, by 
dhuleas i irdhouse item on a personality test. In general, psy- 
Tan os tests in these areas use items that are reasonably tidy 

he e Ban of possessing obviously appropriate content. 
atid ett the correlations might be between such test scores 
Fest able behavioral criteria, the face of validity shines in every 


CONTENT AND EARLY PERSONALITY TESTS 
ality tests. The 


This has been particularly true of the earlier person 
f what was re- 


e of these older tests sampled a wide range o j 
terlardi as significant behavior and accordingly had a heavy in- 
hen ing of items that dealt with symptoms of maladjustment. 
answe it was usually assumed that the subject would give honest 
ers when taking the test. The notion of a direct relation- 
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ship between item content and what was being measured was 
probably a heritage from achievement testing procedures. After 
all, achievement tests in subject matter fields such as geography, 
arithmetic, history, etc., have been used for centuries while psycho- 
logical tests, as we know them, have been in existence for but a 
few decades. If one wishes to measure achievement in geography, 
for example, one uses a test that asks questions about geography. 
That is only common sense and admits no debate. Thus it was 
natural to apply the same technique to personality and interest meas- 
urement by using items which unmistakably mirrored the area be- 
ing measured. If personality adjustment were being assessed, the 
items were straightforward in asking whether the subject “worried 
more than most people” or whether he “often had bad dreams at 
night.” This approach worked in its fashion. It did not produce 
very good measures of personality, but it produced better meas- 
ures than we had before. It was obviously a good start in the right 
direction: 

There were a number of straws in the wind which indicated that 
the a priori face value of item content might not be worth much 
in some behavioral areas. At first glance, it seems reasonable to 
postulate a close relationship between accident frequency and re- 
action time on the assumption that the slowpoke would be prone 
to mishaps, As early as 1929, however, Farmer and Chambers (30) 
reported that correlations between reaction time and number of 
accidents for several occupational groups hovered around zero. 
Curiously, in motor vehicle driver testing, reaction time measures 
are often included even now in “dummy” car testing apparatus, 
apparently on the invalid assumption that those who can get their 
feet off the gas pedal and on the brake will have fewer accidents. 
It is only fair to observe that this particular response measure may 
be included less for its predictive efficiency than for its sales value 
in influencing business executives who hire the testing. 


Concern ror Face VALIDITY 


i While psychologists would not be misled by face validity in such 
situations when they were in the iron grip of a string of zero criterion 
correlations, they sometimes appear to cling to other unsupporte‘ 
beliefs concerning objective personality tests. Indeed, they some- 
times express such beliefs in print. In 1945, for example, Meehl 
(47) found it necessary to take Hutt (38) to task for asserting that 
“structured” personality tests were based on the assumption that 
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the items would have the same meaning for all subjects who took 
the test. This is probably still a fairly prevalent misconception; 
however, as Mechl emphatically points out, neither the Minnesota 
Multiphasic Personality Inventory (MMPI) nor Strong’s Vocational 
Interest Blank (VIB) make this assumption. 

A tremendous number of articles concerning these two tests 
have been published (over 800 on the MMPI alone) and it 
should be quite clear that from the MMPI or VIB stand- 
point, it is not important whether the meaning is the same 
or different for all subjects, nor does it matter whether the 
subject is being truthful or even a good judge of his own be- 
havior. If a subject responded “true” to an item such as “I like to 
mingle in crowds,” it is not important whether he really liked crowds 
or not. The important thing is what behavioral correlates can be 
empirically identified with such a response. If most people like 
crowds and most paranoids significantly do not, we have a good item 
for measuring paranoia. Furthermore, with similar levels of statis- 
tical significance, any content could be used for such an item whether 
it dealt with crowds, horses, or Socrates. These last few sentences 
may be unnecessarily sounding the alarm long after the guard is 
Toused and on the alert; yet the point is of such import that ex- 
Cessive zeal may be tolerated. 

Thus far it has been noted that personality and interest test items 
do not need to have a priori meaningfulness. The MMPI and VIB 
have fully demonstrated this, and they are very good tests indeed. 
Rather paradoxically, however, while in practice adhering to em- 
pirical test rather than a priori item content, Hathaway and MeKin- 
ley (86) in the MMPI and Strong (57) in the VIB paid a good deal 
of attention to the content of their test items. The MMPI used 26 
carefully described categories of items which ranged from general 
health to psychotic symptoms, and the VIB employed lists of oc- 
Cupations, amusements, etc. There was an obvious concern for face 
validity in content. Just why this was so is hard to say; however, 
it may have been a sardonic deference to what users of the tests 
might expect to find. Be that as it may, the VIB used only items 
which had some clear-cut relationship to vocations and the work- 
a-day work. The MMPI has mostly items with content of reasonable 
face validity for personality measurement; however, it also has a 
sizeable number of items which are admittedly enigmatic when 
Scrutinized in this way. 

It appears, therefore, that while objective personality and interest 
tests pay rather careful attention to item content, at least two of 
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the best and most widely used tests of this type do not depend for 
scoring purposes upon an appropriate or fitting response being made 
to the test content. As an elaboration, Meehl’s (47, p. 300) remark 
has pertinence, “Thus it puzzles us but does not disconcert us when 
this relation cannot be elucidated, the science of behavior being 
in the stage that it is. That ‘I sometimes tease animals’ (answered 
False) should occur in a scale measuring symptomatic depression 
is theoretically mysterious, just as the tendency of certain schizo- 
phrenic patients to accept ‘position’ as a determinant in responding 
to the Rorschach may be theoretically mysterious.” , 

The present paper is an attempt to offer an explanation which 
seeks to dispel a portion of the theoretical mystery to which Meehl 
refers and to gather evidence from various sources in an attempt 
to demonstrate that particular content of objective personality and 
interest tests is unimportant. This is not to say that no content 
whatsoever is essential for personality and interest tests. Some 
sort of stimulus pattern is required, of course; however, virtually 
any content of any sense modality should be suitable and under 
some conditions the content may be so insignificant as hardly to 
deserve the name. I believe the available evidence indicates that 
items dealing with jobs, social activities, attitudes, adjustment, etc. 
are quite unnecessary for objective personality, interest, and similar 
tests. One can use such content if he so desires; but one can equally 
well use abstract designs, sounds, lists of foods, lights, imaginary 
questions, spiral after-effect, and content of an equally wide range. 
In this sense, then, item content is unimportant. 


Tue Deviation Hyporuesis 


_ Before reviewing the evidence for the unimportance of particular 
item content, it seems appropriate to offer some theoretical explan- 
ation of why content is not important. At present this can be stated 
only as a hypothesis, though one which has been supported at @ 
number of points by empirical test. This is the Deviation Hypothesis 
which has been set forth in several previous publications (12, 18). 
An outline of the hypothesis will serve as a framework for the re- 
mainder of the present paper. There are literally hundreds of stud- 
ies dealing with isolated empirical demonstrations of the prediction, 
with varying degrees of success, of certain facets of behavior from 
other, ostensibly unrelated facets of behavior. The commones 
examples of this are the variety of clinical measurement devices; 
although there are many others. The question at hand is how we 
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can account for the predictive usefulness described in so many of 
these researches. What, in other words, is the common thread run- 
ning through such myriad studies which deal with bits and pieces 
of behavior? It is to these questions that the Deviation Hypothesis 
is directed. 

The Deviation Hypothesis is based upon biased responses. The 
emphasis, however, is not placed upon the bias itself but rather 
upon the departures from an established pattern of bias. This latter 
is the important factor; indeed, it is the key to the problem before 
us. In a “true-false,” “head-tails,” “agree-disagree” response situa- 
tion, for example, the responses rarely follow a normal probability 
distribution where the stimulus pattern is relatively unstructured. 
Instead of a 50-50 percentage distribution of responses, one often 
finds 80-20 or some equally skewed pattern indicating bias. Cron- 
bach (26, 27) has described a large number of such response sets, 
as he called them, in psychological testing; and other writers (17, 
24, 33, 43, 52) have also provided evidence for the existence of bias. 
The Deviation Hypothesis is not directly concerned with the re- 
sponses which contribute to the pile-up, in the 80 per cent who 
call “heads” when a coin is flipped, for example. Rather the interest 
is centered in the 20 per cent who go counter to the “heads 
response and say “tails” or possibly something else or even say noth- 
ing at all. i 

The persons who deviate from the established pattern of bias in 
such insignificant responses as responding “tails,” “dislike,” “dis- 
agree,” and so on are not merely different in such minor or non- 
critical aspects of behavior. They are also different in critical or 
significant aspects of behavior—or so the Deviation Hypothesis 
would have it. The noncritical aspect of behavior is a reflection 
of a critical aspect; the two go hand in hand. The critical aspect 
1s a personality manifestation. It may be aberrant adjustment such 
as schizophrenia or chronic anxiety or it may be some other aberrant 
Condition such as genius, mental retardation, accident proneness, 
Creativeness, chronic heart disease, or any other cundition which 
may be objectively defined on some behavioral dimension. Thus 
We would include physicians as well as kleptomaniacs, engineers 
as well as scholastic under-achievers as being capable of suitable 
Objective, operational definition. 

Accordingly, the Deviation Hypothesis has been stated (13, p. 
159), “Deviant response patterns tend to be general; hence those 
deviant behavior patterns which are significant for abnormality 
(atypicalness) and thus regarded as symptoms (earmarks or signs) 
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are associated with other deviant response patterns which are in 
noncritical areas of behavior and which are not regarded as symp- 
toms of personality aberration (nor as indicators, signs, earmarks). 
It should be emphasized that abnormality is to be taken in the 
literal meaning of “away from normal.” In this sense psychotics 
or lawyers, for example, show certain responses which distinguish 
them from each other and also from the rest of the general public. 
Everyone makes many responses which are quite like those made 
by the majority of people. Everyone also makes certain other re- 
sponses which are peculiarly his own or peculiarly shared by ; 
special group. These certain other responses are, of course, not shared 
by a majority of people; hence they are the responses designated 
as deviant and may be precisely defined statistically in terms of the 
level of significance by which they depart from the common re- 
sponse. Insofar as can be ascertained, these deviant responses are 
the product, singly or in combination, of past learning, inherited 
structure, and organic or physiological state. Thus it may be said 
that the deviant responses which differentiate engineers from phy- 
sicians, for example, are chiefly the product of past learning whereas 
organic factors (possibly at times associated with inherited defect) 
are presumed to be the basis for any deviant responses exhibited 
by patients with chronic heart disease. , 
Very likely, learning pervades to some degree all aspects of devi- 
ant responses, including those responses which are rooted in heredi- 
tary or physiological aberration. Habits of living to take a case 
in point, may produce stress which culminates in cardiac disorder 
in the manner described by Selye (51). In other cases, presumably, 
a weakness of heart structure would require new learning, that is 
habits which would avoid taxing the weak heart. According to the 
Deviation Hypothesis, both conditions should produce deviant re- 
sponse patterns, though not necessarily the same pattern. It should 
be emphasized that these are but illustrations, not evidence, of how 


learning might be involved in what appears to be essentially an 
organic condition. 


PARTICULAR STIMULUS CONTENT IS UNIMPORTANT 


Thus, what has been said is that no particular content is needed 


for interest and personality tests, nor, for that matter, in a wide 
variety of other behavioral measures. What is needed are stimuli 
that will elicit deviant responses or, more accurately, stimuli which 
will produce “response sets” or biases from which deviant response 
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patterns may be statistically identified. Such stimuli should be rela- 
tively unstructured since lack of structure facilitates the appear- 
ance of bias. Accordingly, the hypothesis with respect to item con- 
tent has been stated (13, p. 160), “Stimulus patterns of any sense 
modality may be used to elieit deviant response patterns; thus par- 


ticular stimulus content is unimportant for measuring behaviors in 


terms of the Deviation Hypothesis.” Attention is called to the fact 
| st as good 


that this statement makes no assertion that any item is ju 
as every other for discriminative purposes. Such a claim would be 
patently absurd. 
What is meant is that if an item concerned with nightmares, for 
example, distinguishes schizophrenics from normal persons at the 
five per cent level of confidence, one can locate equally valid items 
which are utterly different in content, items with content such as 
interlaced triangles, musical sounds, autokinetic phenomena, etc. 
But if interlaced triangles are found to be equivalent to nightmares 
as an item, it certainly does not follow that a drawing of interlaced 
circles would be just ‘as good as cither, or that some musical sound 
must perforce be as valid as all three. While such might be the 
case, empirical demonstration is absolutely necessary to make the 
necessary determination of possible item equivalence. In other 
words, the same procedure that was used to ascertain the value of 
the nightmare item must also be applied to the interlaced triangle 
item—or any other item. , 
From what has been said, it should be possible to use a series 
of abstract designs for test items and do about as good a job of 
measuring personality as can be done with traditional verbal items 
of the “I am troubled with insomnia” variety. Content other than 
abstract designs could, of course, be used; but this example will 
do for a beginning. The Perceptual Reaction Test (PRT) (16) has 


been used for this purpose since it was developed to elicit “set. 


It is composed of 60 abstract designs drawn with ruler and compass, 
h, Like Slightly, Dislike 


and the subject checks either Like Muc 
Slightly, or Dislike Much for each design. Only seven minutes, on 
the average, are required by normal subjects to take the test. It 
would really be a much more effective comparison if the PRT had 
several hundred designs since most of the personality tests in wide 
use have hundreds of items. Be that as it may, a number of studies 
have been completed which indicate that even a mere 60 designs 
of no particular meaning can do a good job of reflecting certain 


facets of personality. 
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VARIETY OF USABLE STIMULI 


Berg and Collier (15) found that groups of high anxiety subjects 
operationally identified by the Taylor Anxiety Scale and the Sway 
Suggestibility Test made significantly more extreme choices (like 
much or dislike much) on the PRT when compared to low anxiety 
subjects. Lewis and Taylor (41) obtained quite comparable results 
with the same test; however, their findings demonstrated that the 
extreme choices were not preferences for extreme position, as Berg 
and Collier thought, but were actually preferences for extreme op- 
tion content in the PRT. A much more detailed study of the diag- 
nostic possibilities of the PRT was published by Barnes (7). Using 
1,700 normal persons (1,000 males, and 700 females) as controls, 
Barnes administered the PRT to 546 (360 males, 186 females) clinic 
and mental hospital patients. By identifying the deviant responses, 
Barnes was able to construct clinical scales as follows: Delta, for 
general NP disturbance; Psi, for psychotic condition; Sigma, for 
schizophrenia; Chi, for character disorder; Psi-Chi, for separating 
diagnostically psychotic and character disorder states. In Barnes 
(7, p. 290) words, “It is concluded that response set on the PRT 
is related to personality factors, that it has a degree of reliability 
which compares well with other tests of personality factors, and 
that it can be used to assess personality disorder.” 

The PRT was also used by Hesterly and Berg (37) to measure 
maturity in relation to schizophrenia, The PRT responses of groups 
of normal children aged 8, 10, and 12, were compared with normal 
adults; and the younger age groups were found to have response 
patterns most different from adults with the difference decreasing 
for the older age group. Since immaturity is commonly associated 
with schizophrenia, it was postulated that no significant difference 
would be found for the deviant response patterns of adult schizo- 
phrenics and normal young children. This was found to be the case. 
The youngest groups of normal children were different in deviant 
response patterns from normal adults but not different from adult 
schizophrenics, 

; Thus it appears that with a simple test composed of only 60 mean- 
ingless designs, we can measure certain aspects of personality by 
means of the Deviation Hypothesis in the same way that the usual 
objective personality inventories do when using traditional verbal 
content. By the same token, it would not be surprising to find a 
relationship between behavior disorders and responses to other non- 
traditional personality test content such as a list of foods. Wallen 
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(61) and Gough (34), for example, showed that neurotic males in- 
dicated that they disliked significantly more foods than normal 
males. 

In another study, Wallen (62) found a similar, significantly greater 
number of food aversions in various clinical diagnostic categories 
such as intra-cranial injury, anxiety neurosis, hysteria, epilepsy, etc. 
when compared to normal males. In a comparison of number of 
food aversions and scores on a test of adjustment, Altus (4) found a 
correlation of .497 between the two measures for data obtained 
from Army illiterates. Smith, Powell, and Ross (55) found that high- 
anxiety individuals, as identified by the Taylor Manifest Anxiety 
Scale, showed significantly more food aversions than low-anxiety 
subjects. These studies were not done as investigations of the 
Deviation Hypothesis since they antedate publications of this con- 
cept. However, they illustrate an aspect of the unimportance of 
particular item content; and like many other studies concerned with 
critical and noncritical responses, they can be fitted comfortably 
into the deviant response concept. 5 . 

Stimuli for conditioned response, autokinetic and spiral after- 
effect perceptions involve noncritical areas of behavior in the sense 
that the responses are not regarded in themselves as symptoms or 
earmarks. However, several studies have indicated that deviant 
patterns in noncritical facets of such behavior are reflections of de- 
viations in critical areas. Taylor (58) and Spence and Taylor (56) 
found that anxious subjects, as identified by her Manifest Anxiety 
Scale, were consistently and significantly superior in all measures 
of eyeblink conditioning and extinction compared to nonanxious 
subjects, Voth (60) tested 845 mental hospital patients and 423 
normal subjects with the autokinetic phenomenon. He found that 
distinctive patterns of deviant responses were characteristic of cer- 
tain patient groups. Schizophrenics, epileptic, and anxiety patients 
among others, revealed more pronounced apparent movement an 
Compared to normal groups. Manic-depressive and involutiona 
patients either experienced no apparent movement or the move- 
ment was much less extensive than normal. Price and Deabler 
(49) and Freeman and Josey (81) used the Archimedes spiral 
after-effect as a means of differentiating patients with organic 

rain damage and patients with memory impairment from normal 
Subjects. Aaronson (1) used a similar technique with an epileptic 
Population. 
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VARIATIONS IN RESPONSE MEASURES 
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More recently, subtler 


aspects of language have been men, 
with respect to their value as indicators of aeg ann 2 
states. These studies may be mentioned as additiona pct tant 
bits of research which lend support to the assertion that par na 
item content is unimportant for personality assessment. ei 
studies can also be fitted into the broad framework of the Devia 


Hypothesis, although they were not intended as tests of that hypoth- 
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shion; however, it seems feasible that > 
it be deemed advantageous to do so. en 

, for example, found that sehizophrenios 
on a special vocabulary test when co 
Lorenz and Cobb (44, 45) reported tha 
Psychoneurotic patients used more verbs and pronouns but fewer 
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capable of manifesting deviant response patterns. 
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tigate such matters as obesity (40), facial dis- 
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3.9), on the hypothesis that the 
responses were related to Personality, A few studies indicate that 
auditory content may also be validly utilized as measures of per- 
sonality variables, A long-play record of musical sounds has been 
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developed by Cattell and Anderson (19) which is probably the first 
use of musical excerpts for diagnosing mental illness. Simon and 
others (53) have reported differences in the recognition of mood in 
music for patient populations and for normal subjects. The “tauto- 
phone” (35), a device which emits sounds which resemble spoken 
words but which are actually meaningless, has been employed as 


a personality test. 

No studies have been found in which sense modalities such as 
taste, smell, tactile sensitivity and the like have been used in con- 
nection with personality measurement. One study by Singer and 
Young (54), however, indicates that response bias exists in responses 
to olfactory stimuli; hence it may be possible to use this sense for 
personality assessment. Theoretically, of course, all senses should 
provide content which may be used in this way. In practice, sense 
modalities other than vision are relatively cumbersome to use for 
personality testing purposes. This may account for the vast pre- 
ponderance of personality measures which involve reading or look- 
ing at something. It should, however, be obvious that a wide range 
of content can be used and has been used in appraising personality. 

What is essentially a deviant response technique has been em- 
ployed in studies of physical diseases, some of which have psycho- 
somatic components. Various stimuli patterns have been used, such 
as those found in the Rorschach, TAT, Blacky Test, MMPI, etc. 
A sampling of such studies includes disorders such as uterine dys- 
function (25), peptic ulcer (18), rheumatoid arthritis (22), derma- 
tosis (48), leprosy (42), constipation (5), and others. To what extent 
Personality variables influenced or determined the course of these 
diseases or to what extent the diseases produced certain personality 
changes is unknown. The significant point is that, even in such 
Cases, a variety of stimulus content can be used to differentiate them 
from normal subjects on the basis of deviant responses. 

Some scales which have emphasized content with direct face 
validity, such as the authoritarian personality F scale (3), have been 
shown actually to measure a considerable amount of simple acqui- 
escence and not only fascist proclivities. This has been shown by 
Studies such as those of Bass (10), Cohn (23), Chapman and Camp- 
bell (20). By its use of a wide range of content and its use of atypical 
responses, the MMPI moved in the direction of de-emphasizing 
item content. This is particularly borne out in the use MMPI makes 
of subtle items, that is, items which are quite unrelated in terms of 
face validity to the personality dimensions they measure. 

If it is only the deviant responses that are important, no matter 
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might ostensibly be regarded as offering an example of a single 
deviant response which is indicative of critical behavior aberration. 
Yet actually such preference for women’s clothes is probably, not 
one, but rather a large number of deviant responses. That is, the 
transvestite would probably use lipstick and perfume, rouge his 
cheeks, enamel his nails, walk with mincing gait, separately put 
on a number of articles of female clothing, etc. Each of these should 
be appropriately regarded as separate deviant responses, adding 
up to a respectable total. 

Since evervone exhibits some deviant responses, it seems unlikely 
that a single response will suffice for identifying significant aspects 
of personality. Attempts to use a few simple deviant response meas- 
ures, such as the handedness studies of Wile (65), Goddard (82), 
and Doll (28), met with very limited success in their investigations 
of the relationship of left hand-preference to feeblemindedness and 
neurotic reactions, Yet these studies did indicate that hand prefer- 
ence could probably be used as one deviant response item. A large 
number of such items might conceivably be scaled into a respectable 
personality test. Be that as it may, there is good reason to believe 
that a wide variety of content can be used for personality test items; 
however, as has been the case in the past, a lengthy series of items 
will be necessary, whatever content may be used. 

The evidence reviewed here is believed to indicate that there is 
nothing of special value in particular item content for objective 
personality and similar tests. Verbal content of the traditional kind 
used in personality tests is not essential; for a wide variety of content 
may be employed with equal effectiveness. Indeed, any content 
which produces deviant response patterns will serve, judging from 
the available evidence. The important thing is not particular con- 
tent, but rather a series of deviant responses and operationally clean 
criterion groups. These are the absolute essentials for using deviant 
responses to measure personality. Thus it is possible to use items of 
traditional content to assess personality; however, conditioned re- 
Sponses, spiral after-effect, abstract designs, autokinetic phenom- 
ena, musical sounds, language behavior, drawings and other content 
may also be used. Some test items, of course, will be better for cer- 
tain purposes than others, just as some items of conventional content 
are better than others for certain testing purposes. But whatever 
t e content, valid discriminations of a number of facets of person- 
ality can be made. Accordingly, for personality and similar tests, 
a particular item content is unimportant. 


References 


nn 


i er- t 
1. Aaronson, B. S, Age, intelligence, aphasia, and the Spiel aften effec 
in an epileptic population. J. clin. Psychol., 1958, 14, 1 PEM 1. Or- 
2. Abel, T. M. Figure drawing and facial disfigurement. . 
tho sychiat., 1953, 23, 253-264 k: 
8. Adio, T. W. et al. The authoritarian personality, New Yor! 
Harper, 1950, 


illiterates. 
4. Altus, W. D. Adjustment and food aversions among army illiterate 
J. consult. Psychol., 1949, 18, 429-439, 


“17: les. 
5. Altus, W, D. Constipation and adjustment among illiterate ma 
J. consult, Psychol., 1950, 14, 25-31, 


» Mathis, G. K., an 


test responses to psycho 
ychol., 1955, 51, 286-290, setts 
Factors, response bias, and the MMPI. J. cor 
Psychol., 1956, 20, 419-491, Ks 
9. Bates, E. H. Response bias and the MMPI. J. consult. Psychol 
1956, 20, 371-374. c. Psy- 
10. Bass, B. M, Authoritarianism Or acquiescence: J. abnorm. soc. 
chol., 1955, 51, 616-623. 
ll. Berg, I. A, The reliability 
tests. J, Psychol., 1953, 36, hesis. 
12. Berg, I. A, Response bias and Personality; the Deviation Hypot 
J. Psychol., 1955, 40, 60-71, 


sp in two 
of extreme position response sets in 
3-9, 


sis. J. counsel, Psychol., 1957, 4, 15E 
14. Berg, I. A, Word choice in the interview and personal adjustme 
J. counsel, Psychol., 1958, 5, 130-135. 
15. Berg, I. A, and Collier, 


2 in 

- S. Personalit and group differences 5 

extreme response sets. Educ. psychol, el 1953, 13, 164-169 
16. Berg, I. A. and Hunt, W. A, an 


d Barnes, 
tion test. Evanston, Il.: 1949, 


96 


E. H. The perceptual reac- 


THE UNIMPORTANCE OF TEST ITEM CONTENT 97 


17. 


18. 


19, 


20. 


21. 


22. 


23. 
24. 
25. 
26. 


36. 


. Cronbach, L. J. Further evidence on response sets an 


. Doll, E. A. Anthropometry as an aid to mental di 


. Freeman, E. and Josey, W. E. 


. Goddard, H. H. The height and weig 


. Gough, H. G. An additional study of food aversions. 


- Grings, W. W. The verbal summator technique and a 


Berg, I. A. and Rapaport, G. M. Response bias in an unstructured 
questionnaire. J. Psychol., 1954, 38, 475-481. 

Blum, G. S. and Kaufman, J. B. Two patterns of personality dynamics 
in male peptic ulcer patients are suggested by responses to the 
Blacky Pictures. J. clin. Psychol., 1952, 8, 273-278. 

Cattell, R. B. and Anderson, J. C. The measurement of personality and 
behavior disorders by the IPAT Music Preference Test. J. appl. 
Psychol., 1953, 87, 446-454. 

Chapman, L. J. and Campbell, D. T. Response set in the F scale. J. 

abnorm. soc. Psychol., 1957, 54, 129-132. 

Chodorkoff, B. and Mussen, P. H. Qualitative aspects of the vocabu- 
lary responses of normals and schizophrenics. J. consult. Psychol., 
1952, 16, 43-48. 

Cleveland, S. E., and Fisher, S. Behavior and unconscious fantasies 
of patients with rheumatoid arthritis. Psychosom. Med., 1954, 16, 
827-333. 

Cohn, T. S. Is the F scale indirec 
47, 185-199. 

Cottle, W. C. and Powell, J. O. The effect of random answers on the 
MMPI. Educ. psychol. Measmt., 1951, 11, 224-227. ; 
Crammond, W. A. Psychological aspects of uterine dysfunction. 
Lancet, 1954, 267, 1241-1245. 

Cronbach, L. J. Response sets and test validity. Educ. psychol. 
Measmt., 1946, 6, 475-494. 


t? J. abnorm. soc. Psychol., 1952, 


d test designs. 


Educ. psychol. Measmt., 1950, 10, 3-31. 
agnosis. Pub. New 


Jersey Trng. Scho., 1916, 8, 1-7. 


. Fairbanks, H. The quantitative differentiation of samples of spoken 


language. Psychol. Monogr., 1944, 56, 19-38. 


. Farmer, E. and Chambers, E. G. A study of personal qualities in 


accident proneness and proficiency. Industrial Health Research 


Board of Great Britain, 1929, Report No. 55. 
Quantitative visual index to memory 


1949, 62, 794-796. 
ht of feebleminded children in 
2, 39, 217-235. 


impairment. Arch. Neurol. Psychiat., 


American institutions. J. nero. ment. Dis., 190 


. Goodfellow, L. D. The human element in probability. J. gen. Psy- 


chol., 1940, 83, 201-205. J. ab 
. abnorm. 


soc. Psychol., 1946, 41, 86-88. 

SY bnormal mental 
states. J. abnorm. soc. Psychol., 1942, 37, 529-545. «asi 
Hathaway, S. R. and McKinley, J. C. The Minnesota Multiphasic 
Personality Inventory (Manual). New York: Psychol. Corp., 1949; 


98 


39. 
40. 


BS 


45. 


56. 


57. 


. Hesterly, S. O. and Ber 


. Hutt, M. L. The use of 


. Lewis, N. A. and Taylor, 


. Lorge, I. Gen-like: h 


- Meehl, P. E. The d 


. Price, A. C., and Dea 


OBJECTIVE APPROACHES TO PERSONALITY ASSESSMENT 


» I. A. Deviant responses as indicators of 
ee J. consult. Rute, 1958, 22, 389-393. 
projective methods of personality measure- 
ment in army medical installations. J. clin. Psychol., 1, a. En 
Jackson, D. N. Content and style in personality assessment. Prin 
ton, N. J. Educational Testing Service, RM-57-11, 1957. 
Kotkov, B. and Goodman, M. The Draw-a-Person Tests of obese 
women. J. clin. Psychol., 1958, 9, 363-364, 


immaturity and schizo 


Jä. Anxiety and extreme response prefer- 
ences. Educ. psychol. Measmt., 1955, 15, 111-116. 
Lord, E. Group Rorschach responses of 
tients. J. Proj. Tech., 1954, 18, 202-207. 46 
alo or reality? Psychol. Bull., 1937, 34, 545-5 a 
Lorenz, M. and Cobb, S. Language behavior in manic patients. Arch. 
Neurol, Psychiat., 1952, 67, 763-770, 
Lorenz, M. and Cobb, S. Language behavior in 
tients. Arch, Neurol. Psychiat., 1953, 69, 684-694, X 
Mann, M. B. The quanitative differentiation of samples of written 
language. Psychol. Monogr., 1944, 56, 41-74. ; 
ynamics of “structured” personality tests. J. clin. 
Psychol., 1945, 1, 296-303, 


Narciso, J. C., Jr. Some 
consult. Psychol., 195 


thirty-five leprosarium pa- 


psychoneurotic pa- 


psychological aspects of dermatosis. J. 
2, 16, 199-201, 


bler, H. L. Diagnosis of organicity by means of 
spiral aftereffe - 1955, 19, 299-302, 
©. The type-token rati 


tress of life. 


New York: McGraw-Hill, 1956. 
» Shelley, H, P, Response set a 


nd the California Attitude Scale. Educ. 
Measmt., 1955, 16, 63-67 


P. T. Studies in 


f affective reaction: IH. 
allective react; 
827-84], actions, 


J. genet. Psychol., 1941, 24, 


. » 2: Manifest anxiety and food 
N. soc. Psychol., 1955, 50, 101-104, 
relation of conditioned response 


y in normal otic, and psychotic subjects. J- 


» 45, 265-279 
Manual 
Univ 


exp. Psychol., 1953 Teny 
Strong, E. K 


m Jr. 
Stanford: Stanford 


for Vocationa, 


2 l Interest Blank for Men. 
ersity Press, 1935, 


THE UNIMPORTANCE OF TEST ITEM CONTENT 
58. 
59. 
60. 
61. 


62. 


. Welsh, G. S. and Dahlstrom, W. G. B 


. Wheeler, W. M., Little, K. B. and Lehner, 


. Wile, I. S. The relation of left-handedness to beh 


99 


Taylor, J. A. The relationship of anxiety to the conditioned eyelid 
response. J. exp. Psychol., 1951, 41, 81-92. i 
Vinson, D. B. Response to electroshock therapy as evaluated by 
mirror drawing. J. clin. exp. Psychopathol., 1952, 13, 201-210. 
Voth, A. C. An experimental study of mental patients through the 
autokinetic phenomenon. Am. J. Psychiat., 1947, 103, 793-805. 
Wallen, R. W. Food aversions of normal and neurotic males. J. 
abnorm. soc. Psychol., 1945, 40, 77-81. 
Wallen, R. W. Food aversions and behavior disorders. J. consult. 
Psychol., 1948, 12, 310-312. 
asic readings on the MMPI in 


psychology and medicine. Minneapolis: University of Minnesota 


Press, 1956. 
G. F. The internal structure 


141. 


of the MMPI. J. consult. Psychol., 1951, 15, 134- 
avior disorders. Am. 


J. Orthopsychiat., 1932, 2, 44-57. 


EN 
 Litiary AN 


oh 
en = \ 
hed - 
Se + 
Calcutta è 
oe „a 
eB Oe 


ee a 


VI 


Social Desirability and 
Personality Test Construction 


ALLEN L. EDWARDS 
The University of Washington 


u 


100 


SOCIAL DESIRABILITY AND PERSONALITY TEST 101 


In the typical personality inventory, the number of possible re- 
sponses available to the subject is fixed by the nature of the test 
so that the subject must choose one of the several alternatives pre- 
sented to him. I shall refer to these various alternatives as response 
categories. An objective personality inventory may have any num- 


ber of response categories. However, we seldom find inventories 


with more than five response categories and, in most cases, the in- 
hree response categories. 


ventories in current use have only two or t 
These are usually of the form: True-False, Yes-No, Agree-Disagree, 
Like-Dislike, and so forth. When three response categories are pro- 
vided, the third category is typically an Undecided category. 
Although it is not a necessary condition for an objective person- 
ality inventory, we generally find that only one of the response 


categories is keyed. By a keyed response, I shall mean the response 
that is assigned a non-zero scoring weight. With a True-False test, 
ble, the keyed response is 


designed to measure a personality varia 
the one that we believe is more likely to be given by those who 
have a greater degree of the variable than by those who have a 
lesser degree. For example, in an inventory designed to measure 
introversion, the following item might appear: I keep in the back- 
ground on social occasions. If we believe, for one reason or another, 
that those who have a high degree of introversion are more likely to 
answer True to this item than those who have a lesser degree of 
introversion, the keyed response would be True. 
Tt will be convenient to confine the present discussion to those 
objective personality inventories in which a limited number of re- 
sponse categories are available to the subject and in which only one 


of the possible responses is keyed. The points I wish to make, how- 
ever, have general implications and would, I believe, apply also to 
d to mul- 


those inventories in which differential weights are assigne 
tiple categories. 


Meruops Usep IN DEVELOPING INVENTORIES 


In constructing objective personalit inventories, three somewhat 
different procedures lege en followed Cattell (4) and Guilford 
(16, 17), for example, have used factor analysis techniques in de- 
veloping their personality inventories. The Minnesota Multiphasic 
Personality Inventory and the Vocational Interest Blank, on the 
Other hand, were constructed by Hathaw: d McKinley (20) and 
Strong (26), respectively, using a proce hich I shall call the 
method of criterion groups. Still a third approach is the one used 
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by Allport, Vernon, and Lindzey (1) in constructing the Study of 
Values, and which I have also used in developing the Personal 
Preference Schedule (9). This latter approach, I shall refer to as 
the construct approach. a 

If the factor analytic approach is followed, one starts initially 
with a large pool of items. Subjects are asked to respond to each 
item and the responses to all possible pairs of items are correlated. 
The resulting intercorrelation matrix is factor analyzed in anticipa- 
tion of obtaining a smaller number of factors th 
of the factor matrix, sim 
items with high loadings on a 
other factors are placed togeth 
items with high loadings on a 
see what they have in common. These items, for the factor analyst, 
will constitute a scale for meas 
Thus Cattell (3, p. 81) h 
dicates: that the subject prefers an art 
on a fine afternoon; that he does not 
his emotions under control; that he 
on in personal matter 
characters have more 
the nation than most peopl 
admit to fits of dread or 


to talk a great deal in his sleep. Cattell 

has tentatively labelled t} i Ra 
concernedness,” 

It is not that a factor 


notion of developing a scale designed to measure “Bohemian Un- 
concernedness.” As a m 


analytic procedure tha 
application of factor r each of the factors ob- 
tained. It is also char: isti 


modern electronic computer, y by the capacity of the 


The criterion 8roup approach demands that we have two con- 
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trasting groups of subjects available. For example, one of these 
groups may consist of individuals labelled as schizophrenic by psy- 
chiatrists and the other of individuals not so labelled. These two 
groups are given a set of items and differences in the responses of 
the two groups to each item are examined. Tests of significance may 
be applied as a basis for selecting those items for which there is 
a statistically significant difference in response between the two 
criterion groups. Thus, it may be found that a significantly larger 
number of schizophrenics than normals answer True to the item: 
I frequently have pains in my feet. This item will then be selected 
for inclusion in a scale—along with any other additional items that 
differentiate between the two groups. Item selection is rigorously 
empirical, and the person who uses the criterion group approach 
is, in general, not at all concerned with item content. He asks only 
that the items included in the scale be those that have been found 


to differentiate between the two groups of interest. The name 
assigned to the variable supposedly measured by the scale is based 
on the nature of the criterion groups used in their selection. Thus, 
MMPI scales have been constructed to measure schizophrenia, de- 
linquency, depression, hysteria, low back pain, and so forth. The 
number of scales which can be constructed following the criterion 
group approach is limited only by the number of contrasting groups 


that can be found. 

If the goal of the criterion group approach is to develop a scale 
useful in the prediction of membership or lack of membership in 
groups comparable to the original criterion groups, the procedures 
followed seem highly appropriate. But if scores on the scale so 
developed are treated, as they so often are, as measuring variation 
along a single continuum or dimension of personality, that is another 
matter. No matter how rigorously the criterion groups are defined, 
it does not seem at all possible that they can ever be made com- 
parable in all respects but one. It may be possible, of course, to 
equate them for such variables as age, sex, socioeconomic status, 
and so forth, but it is well known that as the number of variables 
on which two groups are matched is increased, there is a correspond- 
ing decrease in the number of cases that meet the requirements for 
membership in the criterion groups. If we retain substantial N’s 
in both groups, then we may fave groups differing with respect to 
the criterion, but this criterion will of necessity be complex—a 


Y z F _ 
multiplicity of many things. developed t 

Perhaps the reason so many scales have been deve = o 
Measure clinical rather than normal personality variables is because 
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criterion groups for clinical variables are available in hospitals = 
institutions. We should keep in mind, however, that a scale e- 
veloped in this manner can never be better than the eriterion group 
which provided the basis for item selection. Thus, if a criterion 
group is established by psychiatric judgment and if psychiatric 
judgment is fallible, as it surely is, this may mean that if we use 
the judgments of other psychiatrists to establish the c 
it will not be the same as the ori 
may result in a different set of 
in the scale than those selected 
original group. Two scales so 
predicting the same criterion, may 

In using the construct approa 
sonality scale, the psychologist st 
a personality variable that is of i 
for example, that some 
attention, They lik 
make themselves c 


riterion group, 
ginal criterion group. This, in turn, 
items being selected for inclusion 
on the basis of responses of the 
developed, although supposedly 
bear little relation to one another. 
ch to the development of a per- 
arts with at least a vague notion of 
nterest to him. He may have noted, 
people seem to desire to be the center of 
e to entertain others, to tell 
onspicuous by wearin 


P the construct.” The kinds 
are believed to be rel 


on initially. For example, one 
exhibition m not be the same as an- 
ey may use the s 
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of the construct may be quite different with the result that the 
items in the two scales may also be different. Further research with 
the scales, along the lines suggested by Peak (25) and by Cronbach 
and Meehl (7), with respect to construct validity may prove of 


value in clarifying the difference between the two constructs 


represented by the two scales. 


Item ENDORSEMENT AND SOCIAL DESIRABILITY SCALE VALUE 


Regardless of the approach used in the development of a per- 


sonality scale, there are certain common problems relating to the 
-elates to what I have come to call 


finished product. One of these r 
the social desirability variable and it is this problem that I now wish 


to discuss, 

You are all familiar with the methods devised by Thurstone for 
scaling attitude statements. A number of statements relevant to 
some issue or institution are collected and these are submitted to 
a judging group. The judging group is not asked to respond in 
terms of whether they agree or disagree with each statement, but 
only to judge the degree of favorableness of each statement on, say, 
a 9-point scale. On the basis of the distribution of judgments for 
each statement, scale values are obtained by either the method of 
equal-appearing intervals or the method of successive intervals.’ 
The scale value of a given statement is taken as an indication of 
its location on a psychological continuum such that high values 
indicate very favorable statements and low values very unfavorable 


statements. 

I have applied these methods in scaling statements of the kind 
that we ordinarily find in personality inventories. The instructions 
Siven to the judging group are such that they are not asked to 
respond in terms of whether they agree Or disagree with each state- 
ment, or in terms of whether they think it does or does not describe 
them, but rather they are asked to judge the degree of social 
desirability or undesirability of each statement. In other words, I 
ask them to rate how desirable or undesirable they would consider 
the behavior or characteristic in other individuals. On the basis of 
the distribution of judgments, a scale value is obtained for each 
Statement by one of the psychological scaling methods. The scale 
Value of a statement is taken as an indication of the location of the 
Statement on a psychological continuum ranging from highly so- 
i aling are described by Edwards (12) 


1 The: A 
and Guida methods of psychological sc 
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cially undesirable to highly socially desirable. High scale values 
indicate statements that are socially desirable and low scale values 
statements that are socially undesirable. 

Suppose that we have obtained social desirability scale values 
for a large number of statements. The statements are then printed 
in the form of a personality inventory. A new group of subjects is 
given the inventory and they are asked to respond to each statement 
in the usual manner of obtaining self-descriptions. For each state- 
ment we find the proportion of those responding Yes or True and we 
then plot these proportions against the corresponding, but inde- 
pendently obtained, social desirability scale values, The first time 
that I did this I found a linear relationship between the two vari- 


ables. The product moment correlation between the proportion 


endorsing an item and the social desirability scale value of the item 
was .87 (8). 


Calvin Wright (28) 
He gave 140 items to 
the degree to which ea 


repeated this study with a minor variation. 
127 college students and asked them to rate 
ch statement characterized them on a 9-point 
scale, The mean rating assigned to the statements in self-descrip- 
tion was then correlated with the social desirability scale values 


of the statements. The product-moment correlation between these 
two variables for this sample was .88. 


Using a Q-sort to obtain self-descriptions with the same state- 
ments and with still another 
tween mean Q-sort ra 
females and .84 for th 
I have also scaled 


A social disirability scale 
in the ICL was .83 (11). 


bsequently been co 


ie and social desirability scale 
y selected from the § hizophrenia 
scale, and correlation of .82 and beit 


.86 between probability of endorse- 
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ment and social desirability s 7 5i 
nna from the D or oe = REN 
deai poor possible to generalize, on the basis of the studies 
Ver ae ech eu that, whenever we have a personality in- 
een 2 i T items in the inventory vary with respect to 
Mn ani abi lity scale values, we may expect to find a sub- 
Ki femal ve corr elation between probability of endorsement of 
ps naker eo. desirability scale value of the item. Consider 
ee ae ication of this finding. Although Hanley (19) 
ed the relationship between probability of endorsement 


and i i 

items in the Sc scale of the MMPI using only 32 of the 79 items 

was random. Let us 

set f hold for the complete 
of 79 items. Now, recall how the Se scale was developed. To be 

ifferentiate significantly 


renic patients and a group 


if an it 
em was to be included in the Sc scale, the propor 
would be likely to deviate 


line relating probability of 
value for the normal group. 
between probability of 


endors : 
ement and social desirability scale value is not linear for the 
ossibility. Another is that the 


relati ae 
tionship is linear for the schizophrenic group, but that for this 
d for the normal 
f the Y intercept. Still an- 
regrasci j epts and the slopes of the 
gression lines differ for the two groups. Or, perhaps the social 
lished by the judgments 


stablished by 


a research. We 
(22), indicating that 
dgments of diag- 
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nosed psychotic patients, are related to the social desirability scale 
values of the same items obtained from judgments of a non-psy- 
chotic group. Klett, for example, reports a correlation of .88 be- 
tween the social desirability scale values based upon the judgments 


of psychotic patients and scale values based upon the judgments of 
collese students. 


Tue SocıaL DESRABILITY HYPOTHESIS 


to disagree or dissent, th 
gory is provided, the n 
of a general tendency t 

If the majority of th 
a high score on the invento 


M der the tendency to respond True or the 
rencency to respond False as of primary importance in personality 
inventories. My reason for this belief is that both of these response 


hypothesis, I ha ant irabilit 

or SD hypothesis aa, > + have called the social desirability 
h Lae ee Proposes that, just as individual differences 
ave been found in the eens of subjects to respond True, 
indivi ; regardless of item content, so also are there 
imua differences in the tendencies of subjects to give socially 
desirable responses to items in personality inventories, regardless 
of whether the socially desirable response is True or False. I have 


SOCIAL DESIRABILITY AND PERSONALITY TEST 109 
devised various scales to measure this tendency and these scales 
are referred to as Social Desirability scales or SD scales (9, 18). 
An SD scale is relatively easy to develop. Suppose we take any 
heterogeneous set of personality statements and scale them for social 
desirability, We desire items heterogeneous with respect to content 
simply because we do not wish subjects who are to be given the 
developed SD scale to believe we are measuring some particular 
personality variable, such as, for example, dominance. On the basis 
of the evidence cited previously, we expect to find a linear relation- 
ship between probability of endorsement of these items and their 
social desirability scale values. To develop an SD scale we take 
those items with socially desirable scale values and key the True 
response, For those items with socially undesirable scale values, 
we key the False response. A person’s score on the scale is simply 
the number of times he has given the keyed response in self-descrip- 
tion, that is, the number of socially desirable responses he has given. 
As I have said earlier, I have developed a number of such SD scales, 
but most of the research that has been done to date is based upon 
a scale consisting of 39 items from the MMPI (18). 
Now, let.us suppose we take any existing personality inventory 
Of interest and examine the scoring key for the items contained in 
the inventory, If the trait being measured by the inventory 1s itself 
a socially desirable trait, then ‘we would expect to find a majority 
Of the keyed responses to be socially desirable also. The scoring 
cy for the trait, in essence, would be much the same as the scoring 
ey we would obtain if we keyed the responses as we would in 
developing an SD scale. If the inventory were scored by each key, 
we would expect to find a high and positive correlation between the 
Scores resulting from each key. This should, in general, be true for 
all personality inventories designed to measure traits which are 
themselves considered socially desirable. Similarly, if a high scene 
ON a given personality inventory indicates a trait that is itself con- 
Sidered socially undesirable, then the scoring key for this ag 
should be just the reverse of the scoring key we would ei 
we keyed the same items as in an SD scale. Scoring the = rend 
u each key, we would expect to find a high negative “a A 
eee scores resulting from each key, that is, the trait key 
ey. ; 
In the a described, it could be argued that the resulting cor- 
relations were artifacts of the scoring keys applied to oo yo 
items. By having available a separate and independently ¢ 
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structed SD scale, based upon a different set of items, and by cor- 
relating scores on this SD scale with those of a given pain, 
inventory, we are no longer correlating two sets of scores necessari y 


dependent by virtue of scoring the same set of items by two keys 
which are not themselves independent. 


CORRELATIONS BETWEEN THE SD SCALE AND OTHER SCALES 


A person with a high score on the SD scale can be described as 
one who has given a large number of socially desirable responses in 
self-description, whereas a person with a low score can be described 
as one who has given relatively few socially desirable responses in 
self-description. If this is a stable and consistent personality char- 
acteristic, we should find it evidenced in performance on a variety 
of other personality inventories, regardless of the particular traits 


supposedly being measured by these inventories, 
pose we have an inve 


desirable trait. The 
desirable responses i 


» Responsibility, and Status 
Ween scores on these scales 
> as reported by Merrill and 
AY, ‚52, and .61, respec- 


s 
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Lar em ge Within the MMPI we can find a wide variety of 
parc ol = high scores indicate socially undesirable traits. 
— Deom ations were obtained by Merrill and Heathers (24) 
as res on these scales and scores on the 89-item SD scale 

ample of 155 males. The tetrachoric correlations are as fol- 


lows: 
Correlation with the 39-Item 


MMPI Scales SD Scale 
gay a —.50 
ependenc: —.73 
Hostility j 75 
Manifest Anxiety —.84 
—.90 


Social Introversion 


MMPI, Merrill and Heathers 


or the various clinical scales of the 
Jations with the 39-item 


24 
Dap the following tetrachoric corre 
cale for the same sample: 


Correlation with the 39-Item 
MMPI Scales SD Scale 


Hs Hypochondriasis —.52 
Pt Psychasthenia — 55 
Se Schizophrenia 77 
D Depression 61 
Pd Psychopathie Deviate 50 
Hy Hysteria 08 
Pa Paranoia —.09 

—.13 


Ma Hypomania 
e those of .08, —.09, 


omania, respectively. 
have become so ac- 


ee ires lowest correlations with SD ar 

sa a. Hysteria, Paranoia, and Hyp 
Customed t G my work with the SD scale, I 
on other to finding substantial correlations between SD and scores 
Seek fo; anventories that, when low correlations are obtained, I 
Social di om explanation in terms of the relationship between the 
esirability scale values of the items and the manner ın which 


Ne item responses are keyed. 

ct tends to make responses 
ay be related to other 
ant responses WO 

e or less synony- 
e related to 


esis, if a subje 
„this tendency may 
the SD scale, devi 


2 Accordi 
avoided ke $ to Berg’s (2) deviant set hypoth 
orms of dei e majority of a group of subjects 
result in ] iancy from normative standards. On 
low scores. On the SD scale, then, deviancy would be mor 


Mous wi A > 
with “social undesirability” which, in turn, has been shown to b 
of the MMPI. 


“abnorm: tg. op 
ality,” as measured by the clinical scales 
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FACTORS INFLUENCING CORRELATIONS wrru SD 


In general, a low correlation between scores on the SD scale and 
scores on another personality inventory could result from at least 
two conditions. We know, for example, that, if the trait being 
measured by an inventory is itself socially undesirable, then, in 
general, most of the keyed responses will, in turn, be socially un- 


on the inventory, the subject must, 
y undesirable characteristics. Sup- 
o obtain at least some items such 


he MMPI are such that the keyed 
response. Then to these items, a 


scale, 


S on this point is available. A study by Hanley (19) 
indicates that approximately 75 per cent of the items in the Sc scale 
have socially undesirable scale values, with 25 per cent falling in 
ly desirable categories. For the D scale, on 


approximately 52 per cent of the items have 
cale values, wher 


tems in the D scale is somewhat 
y scale values of the items than 
ould, therefore, expect to find, as 


a e SD scale and se ther 
ersonal 3 and scores on ano 

P ality inventory, We might, for example, have an inventory 

f the items have social desirability 

on of the psychological continuum. 


SOCIAL DESIRABILITY AND PERSONALITY TEST 118 


That is, these items may be relatively neutral with respect to their 
social desirability scale values. If we have a number of items with 
neutral scale values, a subject whose responses are primarily in- 
fluenced by social desirability considerations will be in a quandary 
as to how he should respond. If the scale value of an item is truly 
neutral, then there is no socially desirable or undesirable response 
that can be made by a subject in answering it. In this situation, 
we might argue, his responses are more likely to be influenced by 
the content of the item. The correlation between SD and scores 
On the inventory should thus decrease as the number of neutral 
items in the inventory is increased. 


SUBTLE ITEMS 


Some years ago, Wiener (27) attempted to classify the items in 
the various MMPI scales into two groups, one of which he called 
subtle and the other obvious, For five of the MMPI scales he was 
able to find two such groups of items. The three scales, Hysteria, 

aranoia, and Hypomania, for which low correlations with SD are 
= ported by Merrill and Heathers (24), are among the five. The two 
ditional scales are the D and Pd scales. Hanley (19) has suggested 
that subtle items are those with neutral social desirability scale 
as, I have expressed the opinion that not only may a neutral 
a m be a subtle item, but that any item for which a socially dsi 
b € response is keyed as a sign of socially undesirable trait pen 
ad a subtle item (18). In the case of socially desirable traits, a subt le 
em would be one for which the socially undesirable response 1s 
eyed. Recall that I define socially desirable and undesirable re- 
SPonses on the basis of an item’s social desirability scale value. 
m et us accept this hypothesis concerning subtle items, for the 
oment, and see if we can predict what we should find when we 
orrelate scores on the SD scale with those on the subtle and ande 
er. of the SD scale with those on the subtle and obvious scales 
the MMPI. For the obvious scales, we should have more items 
1. Which the keyed response is a socially undesirable ae 
ve in the case of the subtle scales. The subtle scales, on noka a 
w nd, should contain more neutral items and/or fare f er ir 
: hich the keyed response is a socially desirable response than ae 
àse of the obvious scales. If this argument is sound, then we s ion 
nd a substantial negative correlation between the SD scale oe z 5 
sh vious scales, For the subtle scales, the correlations = a 2 
eula definitely be lower, with the magnitude and sign of the c 
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relation depending solely upon how many neutral items the scale 
contains and upon the number of items for which the socially de- 
sirable response is keyed. If we have many items for which the 
socially desirable response is keyed, the correlation with SD should 
be positive in sign. ‘ 

At my suggestion, Fordyce and Rozynko (14) drew a sample o 
50 MMPI records from the files of a VA hospital and obtained 
product-moment correlations between scores on the 39-item SD 
scale and total scores on the D, Pd, Pa, Ma, and Hy scales. They 
then calculated the correlations between the SD scale and the 
separate subtle and obvious scales. The results are as shown below: 


Correlations with the 39-Item SD Scale 


MMPI Scale Total Obvious Subtle 
D —.69 —.78 .33 
Pd —.67 —.85 27 
Pa —.52 —.72 .06 
Ma —.08 —.53 .40 
Hy —.28 —71 .54 


Note that in ever 
the obvious scale is 
of both subtle and 
relations between 


y instance the negative correlation of SD with 
greater than it is with the total scale consisting 
obvious items. This is as it should be. The cor- 
SD and the subtle scales, on the other hand, are 


ame magnitude as the negative cor- 


obvious scales indicates that the subtle 
eyed sociall 


ble time with the MMPI and social de- 


cause I believe the MMPI to be the only 
which the social 


points I have made would apply, I believe, 
equally well to any other inventory of the True-F a Tind, 
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MINIMIZING SOCIAL DESIRABILITY 


If we . 

be i heat bear scores on objective personality inventories to 
about it? One y te social desirability variable, what can we do 
social desirabilit a is that we can attempt to correct for 
example, if we 4 y means of such scales as the SD scale. For 
rn rer de or the correlation between the SD scale and 
Scam Diese a ey r personality inventory, then we can predict the 
Does alle person would receive on the inventory by means ofa 
then ei function of these scores on the SD score. If we 
readily be a in ee score from the actual score, it can 
with the SD sc hat these deviation scores will be uncorrelated 
tween SD and ores. Unfortunately, however, the correlations be- 
magnitude . on various personality inventories are of such 
more than erro ne residuals or deviation scores may represent little 

ano os r variance. It is well known that the reliability of 
measures Aer a a general, considerably lower than the separate 

Another ee the difference scores. 
neutral with re: lity would be to search for items that are relatively 
not know wh tL pect to their social desirability scale values. I do 
that, on the ba ir this is a hopeless search or not. I can only say 
number of ite as of my experience in scaling personality items, the 
than the on with relatively neutral scale values is much smaller 
scale values er I find with socially desirable or socially undesirable 

Alon : R 

Pte Mad sa lines, we might seek items 
ure sociall p pis the keyed response in scales designed to meas- 
Socially d A ey D esirable variables. For scales designed to measure 

irable variables, we would, of course, attempt to fin 


items f 
or whi Š 4 
hich the socially undesirable response is keyed. The five 
osest approximations 


subtle s 
we lave pi the MMPI are perhaps the cl 
4 he present time, to scales of this kind. Additional re- 
le scales designed to 


Search di 
ir 
Measure ee ted toward the development of subt 
rmal personality variables is needed. 
f social desirability in 


thi 
peant approach to the minimization © 
ersonal | is the one I have used in developing the 
‚° minimize on Schedule. In this inventory, An attempt is mace 
he operation of the social desirability variable by pair- 


Ing sta 
state: 
ments representing different personality variables on the 
h a way that the 


asis of thei 
th ; í 
Social hee Bee desirability scale values in such 
he subject ility scale values of the two statements are comparable. 
is then asked to choose between the two statements. 


such that the socially 
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In this way, we hope to minimize the probability of the re 
determined by social desirability considerations aa i z “Aas 
take time to cite the considerable evidence availab le whic 1 prar 
the extent to which this forced-choice type of nae re y 3 
en the social desirability variable. It is cited in detail in y 


book on the social desirability variable in personality assessment 
(13). 


STATUS OF THE PERSONALITY INVENTORY 


And now—what of the future of the objective personality (08) 

tory? Let us first go back to the past. In 1945, Kornhauser es 

i a survey in which he queried specia of 

with various psychological tests. an 
vey had to do with their satisfaction w : 

inventories and also with the Rorschach. ki 

results were more or less a tie, with 51 per cent expressing so 
n with personality inventories and 49 per ce 


I do not have the results of a 
1958 rather than 1945. I do not t 
case, however, if I said that prob: 
to develop objective personality 
with the results of our efforts, 
done in the way of research 


ventories that are judged as sa 
tests, 


comparable survey for the 
hink I would be overstating FI 
ably all of us who have anemp e 
inventories are not overly m 
There is much that remains to he 
before we will have personality = 
tisfactory as, let us say, achieveme! 
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Objective Scoring of 
Projective Tests 


Wayne H. HOLTZMAN 
The University of Texas 


Eya since L. K. Frank's first use 
of the term “projective method” in 1939 (15), there has been a rapid 
mushrooming of techniques for encouraging an individual to reveal 
aspects of his personality by the way in which he perceives, organ- 
izes, or relates to potentially affect-laden, ambiguous stimuli. Stem- 
Zung largely from psychoanalytic theory, such projective techniques 
range all the way from free association in relatively unstructured 
Situations to rather highly structured, formalized devices such as 
the Thematic Apperception Test. Before considering the problems 
of quantification and objective scoring, it might be instructive to 
examine closely the assumptions implicit in the projective method 
as contrasted to those underlying psychometric tests and measure- 
ment theory. 


PROJECTIVE COMPARED WITH PSYCHOMETRIC METHODS 


Unlike the standardized aptitude test, the projective approach 
deals with the idiomatic expression of the individual as revealed in 
the context of his needs, fears, strivings, and ego-defensive behavior. 
As Frank has so aptly stated, “The essential feature of a projective 
technique is that it evokes from the subject what is, in various ways, 
expressive of his private world and personality process.” (16, p. 47). 
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Given any projective technique where the subject is offered a wide 
latitude in which to reveal himself, the particular sample of re- 
sponses obtained is assumed to reflect significant aspects of the 


subject's personality organization, if only the examiner can find the 
key to its interpretation, 


Macfarlane and Tuddenham have 
morphic assumption concerning the 
personality leads to three corollaries t 


pointed out that such an iso- 


minants of each and every 
belief that projective tests 
ality equally in different individ- 
© wary, sophisticated projectivists 
sumptions necessarily fol- 
assumption underlying the projective method— 
f projective test protocols is but a tiny fragment 
ality, fraught with innumerable possibilities for 
‚ in actual practice it is difficult to 


ity from projective protocols often rey 
of the clinician than th 


In contrast to a 


eal more about the personality 
at of the subject. 


Projective tech 


of validity (18), iability and the concept 
Contrary to the opinion of some writer. y 

; et s (37), such psychometric 

theory is not necessarily limited to a nomothetic ae re one 
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is interested in group or inter-individual differences. As Cattell (6) 
has been quick to point out, one can legitimately utilize psycho- 
metric theory for idiographic purposes by considering k different 
measures on m different occasions for a single person. Nor need 
psychometric theory be restricted to consideration of one response 
variable at a time—the oft heard criticism that a psychometric, 
statistical, or quantitative approach is too atomistic to provide more 
than a ridiculous caricature of the individual personality. While it 
is true that most contemporary uses of test scores deal with isolated 
traits, or at best with linear combinations of several traits, the advent 
of configural scoring methods (30), the possibilities of profile analysis 
(19), and other complex, multivariate procedures open new vistas 
for effective utilization of psychometric theory in the study of the 
individual personality. 

„Use of psychometric theory as a basis for 
ity commits one to a trait theory of personality. Postulating some 
sort of “true” score as a hypothetical construct to be inferred from 
observed scores is tantamount to saying that John Doe has X amount 
of the trait in question. It is not necessary, however, to think of 
John’s possession of the trait as a “fixed” quantity. An individual's 
true score remains invariant only so long as the specific testing con- 
ditions remain constant and there is no real change in the individual 
with respect to the trait in question. A primary purpose of test 
standardization is to minimize constant sources of error that are 
ordinarily confounded with the inferred true score. Only errors of 
measurement that are random in nature can be adequately assessed 
and taken into account by the usual concepts of reliability and va- 
lidity within contemporary psychometric theory. 

Rosenzweig (37) has observed that assessment procedures can be 
ordered on a continuum depending upon the degree of structuring 
and control introduced by the assessor. At one extreme are the 
completely qualitative, unstructured methods of psychoanalysis, 
free association by a patient in the presence of an analyst. At the 
other extreme are highly structured paper-and-pencil tests which 
meet all the standards of psychometric theory. Projective tech- 
niques are seen as falling somewhere in between the particular 
position on the continuum depending upon the degree of standard- 
ization and control. In most instances, the projectivist has tried to 
preserve the qualitative, idiographic essence of the projective 
method while also searching for ways in which to categorize, quan- 
tify, and standardize the response variables underlying test be- 
havior. He would like to have a tec 


assessment of personal- 


hnique for assessing personality 
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which covers a wide band of the above continuum u a a 
degree of power throughout the range. Very few psychologis : E 
deed have completely and consistently refrained from some for 
of abstraction later leading to quantification. 

As soon as an individual decides to classify and enumerate any 
characteristics of a subject’s responses to a projective technique, 
however crude and elementary the system, he has shifted from a 
purely projective point of view to a psychometric frame of reference. 
Such measurement may be quite nominal and only faintly resemble 
full-blown quantification, Nevertheless he has made the first and 
most significant step by classification of responses. For example, to 
classify a given response to an inkblot as a W assigns meaning to 
the response that transcends the idiosyncratic, private world of the 
subject. Unless one considers such symbols as W, D, and d, mere 


short-hand devices that have no real meaning beyond calling one’s 
attention to certain as 


pects of the protocol, the symbols take on 
nominal characteristics of measurement, Those subjects who use the 
whole inkblot are seen as one class of individuals (W-tendency type), 
while those who use 


only a small part of the inkblot for their re- 
sponse are seen as another class (d-tendency type). 


Such symbols of classification can be 
specified characteris 


or less elaborate p 
or empirically, w} 
tribute to be inferred from th 


aw protocol. More 
ed, either rationally 


ach pro 
ndicatin 


‚are th only two movement responses. Such a 
statement implies a crude kind i 


of ordinal le } hich people 
can be ordered accordi i ef a whieh: peop 
the total number of r 
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As one becomes engrossed with the counting of symbols it is 
very easy to forget the nature of the projective material being 
classified. In his eagerness to make a given technique meet the 
demands of both psychometric and projective theory, the psychol- 
ogist often compromises the two sets of conflicting standards to the 
point where the technique fails to accomplish either aim. There 


are some projective devices that should always be treated by qual- 


itative methods of analysis since almost any attempt to abstract 


quantitative scores will fail to have any meaning. Other projective 
techniques may be altered sufficiently to yield scores meeting ac- 
ceptable psychometric standards while at the same time preserving 
the projective nature of the task. It is too much to expect a tech- 
nique designed originally as a purely projective method to lend 
itself to a meaningful kind of quantification without some revision, 
and in many projective techniques no amount of revision will pro- 
duce adequate scores in the true psychometric sense. 

j Frank (16) has divided the projective techniques into five general 
kinds: constructive, interpretive, constitutive, cathartic, and refrac- 
tive. The constructive methods consist of those techniques which 
require the subject to arrange materials into larger configurations 
or to produce drawings as in the Draw-A-Person Test. The inter- 
pretive methods are primarily verbal-associational techniques such 
as the Thematic Apperception Test. The best known example of a 
constitutive method is the Rorschach in which the subject must or- 
ganize relatively amorphous, unstructured inkblots into meaningful 


concepts, While most projective techniques may stimulate cathartic 
reactions, some, such as play therapy with dolls, are designed spe- 
Frank’s classes, the refractive 


cifically for this purpose. The last of 

method, is based upon the fact that any conventionalized mode of 

communication—handwriting, gestures, and other forms of expres- 

sive movement—may be used as an approach to the individuality 

of a person. 
The above classification serves as a convenient basis for a more 


detailed discussion of scoring problems and quantifications in the 
analysis of projective techniques. Since cathartic methods cut across 
f expressive movement 


the other procedures, and since the analysis 0 

and individual style of communication can be considered as a special 
topic apart from more conventional projective methods, only the 
first three of Frank’s classes will be discussed. Considerably more 
attention will be given to the Rorschach and related techniques 
than to the constructive or interpretive methods, partly because the 
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Rorschach has been studied longer and more exhaustively than any 
other projective test and partly because it provides an unusually 
good illustration of various problems of quantification encountered 
throughout the projective-psychometric continuum. 


Constructive METHODS 


The way in which a child or adult arranges miniature life toys, 
draws a figure of a man or woman, or builds mosaics from colored 
pieces can reveal a great deal about his personality, Generally 
speaking, however, such creative productions are very difficult to 
analyze in any objective, quantitative fashion. Most clinicians only 
use qualitative procedures when dealing with constructive methods. 


Occasionally the characteristics of a construction may be classified 
to formalize its description, but inferences regarding personality, 
whether based u ic i 


O prove useful in the 
ative products, even though the 
ghly structured as in the Bender-Gestalt 
struction has to be viewed as a whole or as 


parate units analogous to test items. 
» color, shading, and ot} 


special cases, fairly successful attempts have 
limited aspects of such 


ic analysis we Chover and others have developed 
Ic analysis utilizing a sign approach to the scoring 
System, the subject must draw both 
a man and a woman so tha Comparisons of self-sex and opposite- 
f . ood a B : : 
method is the scale of figure Si . example of this graphic sign 
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analysis, Criterion groups for the initial selection of items consisted 
of college students with high and low field dependence as meas- 
ured by a battery of perceptual tests. A total score is obtained by 
summing the number of signs checked during the detailed analysis 
of the two figure drawings. Some of the signs are completely ob- 
jective such as transparency, lack of ears, or hair shaded. Others, 
like consistency rating and rigidity rating, are subjective and re- 
quire a clinical judge. For the most part, however, the list of signs 
is sufficiently objective to merit further study. 

Graphic signs have been used with similar success by Pascal and 
Suttell in the objective scoring of drawings in the Bender-Gestalt 
Test (34). The test consists of nine geometric forms that are copied 
by the subject. The number of scorable signs on each design varies 
from 10 to 18, with seven additional signs dealing with the total 
configuration of all nine drawings. Each sign is given a numerical 
weight varying from one to eight. The size of the weight was 
empirically determined in earlier studies differentiating normals 
from such groups as psychotics and organics. = 

i A single score is obtained by summing the weights of positive 
Signs, the higher the score the more pathological the record. Al- 
though much valuable information may have been sacrificed at 
the expense of obtaining a single quantitative index, the resulting 
score has sufficiently high reliability and validity in a variety of 
situations to prove highly useful as a screening procedure. 

A third variation of semi-structured drawing which represents 
an attempt at objective quantification is the Drawing-Completion 
Test described by Kinget (23). Eight squares are presented to the 
subject, each containing small, but suggestive, stimuli such as a dot, 
a wavy line, or a black square, around which the subject draws 
whatever he wishes. Kinget has attempted to develop a graphic 
System with a series of crudely quantitative variables, some based 
on content analysis and others dealing with style and expressive 


features of the drawings. A personality profile is constructed by 
recording signs and then adding them together in more abstract 
ts to quantify the Rors- 


Categories, somewhat like the first attemp he I 
chach. While the rationale behind the scoring system 1s highly 
Speculative and smacks of arm-chair analysis without adequate 
empirical support, the method itself is interesting and sufficiently 
novel to deserve careful study. , eh 
Working with spontaneous finger paintings, a construction mn 
has proved very difficult to quantify, Dorken (10) has developed a 
Series of objectively defined rating scales for energy output, affective 
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range, contact with reality, and clarity. Pictorial norms were used 
as points of reference to anchor the scales, The variable, Affective 
Range, illustrates the technique. “Spontaneous” colors, red and 
yellow, were each assigned scale values of three, blue and green 
were given values of two each, and the “somber” colors, black and 
brown, were each scored one. Combination colors were scored in 
relation to this primary scale. Test-retest reliability ranged from 
18 to .84, depending upon the sample and time interval between 
administrations, By using a series of finger paintings, reasonably 
adequate summary scores on the four variables defined by Dorken 
should be possible, 


It is significant to note that in each of the above examples of 


, weighted, and 
global” but quantitative, measure 


Ideal ) portant dimensions of personality. 
eally, the sign approach should begin with sufficient theoretical 


rationale to construct a coherent system. After careful operational 
etivity of scoring should be de- 
trained individuals independently 
ocols. In some instances where 
nales in their definition, their con- 


Ing a weighting syst ei ienc 
forpre dieting sa „. > ting system that has maximum efficien y 


3 . In any case, the burden of proof con- 
cerning the reliability and objectivity of any py i tem 
rests with the individual = proposes PH ee 


$ stories was made by Mor- 
Kotio fot Tea mani Past 20 years, Murray's The- 
technique, second a re mae become a standard projective 
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in the clinic and laboratory. Numerous other interpretive methods 
—Rosenzweig’s Picture-Frustration Study (86), Bellak’s Children’s 
Apperception Test (22), and Shneidman’s Made-A-Picture-Story 
— (48), to mention but a few—stem more or less directly from 
furray’s pioneering work and attest to the fruitfulness of the basic 
method, 
__ Interpretive methods range all the way from one end of the pro- 
Jective-psychometric continuum to the other. Representative of the 
purely projective approach is the standard TAT analyzed entirely 
in a qualitative manner, focusing upon the content of stories and 
Stylistic aspects of the story telling as illustrated by Stein (44), such 
oe je draws heavily upon careful deduction and clinical intuition. 
nly one step removed from this intuitive approach is the more 
formal kind of qualitative analysis in which various characteristics 
of each story are classified according to theme expressed, kinds of 


affect, need categories, and the like. Such qualitative systems tend 
ction of the analyst. 


to vary considerably according to the predile 

Representative of the diverse approaches to analysis of TAT proto- 
cols is Shneidman’s (43) compilation of systems used by 15 different 
authorities working with the same TAT record. 

Several investigators have developed sets of rating scales to be 
used with the TAT. One of the most extensive systems is Hartman’s 
(21) consisting of five-point scales for 65 categories covering the- 
matic elements, feeling qualities, topics of reference, and more 
formal characteristics, each of which can be scored for a given 
Story. Total scores are obtained by summing ratings across stories. 
While such scales utilize the clinical skill of the interpreter, serious 
difficulties often arise when one is concerned with the objectivity 
of the scoring. When categories deal with the manifest aspects of 
a story, independent raters can generally agree at a satisfactory level 
to insure fair objectivity. But as soon as attention is focused upon 
Covert aspects of the response or upon the personality of the story- 


teller rather than his production, agreement falls off sharply (46). 
hen dealing with the 


The reason for this greater subjectivity W 1 
Personality of the subject is apparent when one examines closely 
the nature of the factors influencing response to 4 TAT picture. 


Holt (22) discusses nine different determinants of the manifest re- 
ext to personal style of the 


Sponse, ranging from situational cont 1 kof 
story-teller. The interpreter is faced with the very complex taso 
bie the probable influence of each factor pr he ain aniye 
at an interpretati naps personality: It is Ssomewna 
pretation of the subjects p y al of which can be 


havi ó T ad F , 
naving an equation with nine variables, sever 
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partially discounted while most remain unknown nage > 
eral judges will weigh the unknowns quite differently, resu ting ir 
zi varying ratings, 
nes dicen a test-oriented systems dealing with formal 
characteristics of the response and personality-oriented systems in 
which the interpreter makes direct inferences concerning the per- 
sonality of the story-teller is fundamental. The more superficial or 
concrete the system, the more objective the scoring and the less 
relevant the derived variables to the personality of the subject. 


Young (51) developed a set of 23 well-defined traits, such as Anxiety, 
Dominance, and Need to 


the personality of the int 
trained interpreters independently rated 12 TAT stories from seven 


their own personalities, demonstrating the 
such methods of analysis, 


ment Motive, and have demonstrated how it can be reliably scored 
in TAT stories. The sco 


sponse elements by objec 


ard in terms of the ease 
‘ emes are evoked, Such data for the TAT can 
be roughly thought of as analogous to difficulty level or other item- 
parameters in aptitude tests, A recent application of Eron’s ap- 
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proach demonstrates how Guttman’s scaling method can be 
employed using normative TAT data to construct a uni-dimensional 
scale for need-Sex (1). 

A final example of an objective approach to the scoring of the 
TAT is one devised recently by Dana (9). Three fundamental as- 
pects of test behavior—approach to the situation, normality of re- 
sponse, and rarity of response—were used by Dana to define three 
variables amenable to objective scoring, Perceptual Organization, 
Perceptual Range, and Perceptual Personalization. Inter-scorer re- 
liability in terms of percentage agreement between independent 
judges ranged from 76 to 94 for the three scoring categories in a 
study of 150 TAT stories. The unique aspect of Dana’s approach 
is the fact that these three variables are sufficiently pertinent to a 
large variety of projective techniques to permit inter-test compari- 
sons for sharpening the validity of the personality constructs in- 
volved, 

Variations of the sentence completion method provide much more 
Suitable data for psychometric development than the TAT. The 
technique consists of providing the subject with a list of incomplete 
sentences to which he responds with whatever completions come 
to mind. By wise selection of sentence stems, content fairly similar 
to the thematic apperception methods can be obtained. Of course 
the response is much more highly structured and discrete from one 
item to the next than is the case with the TAT. Herein lies the chief 


virtue of the method with respect to quantification. 

Rotter and Willerman (38) developed one of the first sentence 
completion tests with high objectivity. Designed for large-scale 
Screening purposes in the Army Air Force, their 40-item version 
yielded a single adjustment score having inter-scorer reliability of 
89 and split-half reliability of ‚85. A refined version of this test 
designed for college students, the Rotter Incomplete Sentences 
Blank (39) has an objective scoring manual with reported interscorer 
reliability of .96 and split-half reliability of .84, unusually high for 


a projective technique. iJi ion of tl 
Trites and his colleagues (47) developed a military version of the 
bjectivity while 


sentence completion method to a high degree of o r 
at the same time dealing with a number of response-categories 


rather than just one. A scoring manual was written on the basis 
of 1038 test protocols which yielded interscorer agreement ranging 
from .80 to .96 for eight major variables, Conformity, Ego Esteem, 
Gregariousness, Sexuality Attitudes, Air Force-oriented Motivation, 
Hostility, Insecurity, and Unscorable Response. Although there 
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is little direct evidence to support the validity cf tae kas oen 
ith respect to the personality constructs implied, in a a Iren 
Ende of inter-item correlations where the items had in 
dichotomously as indicating either a positive or Sy a te 
with reference to adjustment to flying, Trites (48) a a A 
factors which were meaningfully linked to several of the g 
jor variables. . 
1 is instructive to note the characteristics of the rs ne se: 
pletion method which are responsible for achievement of Er ames 
psychometric standards. Unlike the TAT, the number of vee = 
making possible an atomistic treatı 
ndue distortion of the technique. thee 
pictures, each with an infinite Tae O: 
le, the sentence completion method has 
r which the variety and extent of a 
1e more circumscribed nature of the tech- 
development of an objective scoring aa 
at may be present in the response. Th ‘ 
ment does not necessarily reduce the = 
ness of a projective method is demonstrated by the repeatedly 
high validity obtained for the Rotter Incor 


mplete Sentences Blank 
in assessing level of personal adjustment (39). 


Constrrutive METHODS 

The Rorschach test st 
in the amount of attention, both clinic 
has received during the past twenty 
encountered in scoring responses to 
tative analysis of responses to inkblo 
one extreme of the projective. 
Some writers (25, 41) 
dealt with in 
and symbolic 
choanalytic t} 


iecti iques 
ands alone among projective techniqu 


al and experimental, which : 
years and illustrates problems 
constitutive methods. Quanti 
ts has ranged all the way from 
“psychometric continuum to the other. 


„Ave pointed out how the Rorschach can be 
a purely qualitative manner. 


nature of th 


5 ee 
individual difference 


enough, the same 10 
ughout! 


these various degrees of structuring and quan- 


OBJECTIVE SCORING OF PROJECTIVE TESTS 131 


tification based upon sound principles of measurement theory? 
Does the Rorschach really span the entire projective-psychometric 
continuum with the high degree of power claimed by some of its 
proponents? 

The most rudimentary form of quantification in the Rorschach 
is the assigning of symbols to certain kinds of responses which are 
then looked upon as signs pointing to various personality attributes 
or nosological classes. An excellent example of such a classification 
of qualitative signs is the analysis of verbalization described by Rap- 
aport (35), who presents a very careful rationale for the scoring of 
such pathognomic verbalizations as confabulations, contaminations, 
confusion, absurd responses, and ideas of reference. Such signs are 
not additive except in the very crude sense that a number of positive 
signs in a single record tend to pile up in confirming the diagnosis. 

The widely used “formal” scoring methods for the Rorschach 
represent attempts to measure the perceptual variables implicit in 
the response. The complex nature of the stimulus permits a wide 
latitude of location, of determinants, and conceptual content. Once 
decisions have been made as to what constitutes a discrete response, 
the number of such responses to a given inkblot or to all 10 Ror- 
schach plates can be determined. Although there are some minor 
problems encountered in deciding when a verbalization is truly a 
response for purposes of scoring, one can safely assume that inter- 
scorer agreement as to number of responses (R) is quite high re- 
gardless of the judge’s theoretical position. Similarly, the scoring 
of location, at least in its gross elements of whole, usual large detail, 
or small and unusual detail, does not pose serious problems in the 
attaining of reasonable objectivity. Aside from specialized uses of 
content such as Elizur’s anxiety score (11), the categorizing of con- 
cepts into human, animal, and other generic classes is quite 
straightforward also. The greatest difficulties in achieving scoring 
objectivity arise in the realm of response-determinants. ; 

Trying to determine those stimulus attributes which are responsi- 
ble for eliciting a given response amounts to a kind of global psy- 
chophysics for which the general laws have yet to be worked out. 
Although logical in their conception, most scoring systems for de- 
terminants involve a number of highly arbitrary decisions, the he | 
dom of which is highly debatable. The subjectivity of the metho a 
the influence of factors extraneous to the blots such as the examiner- 
subject interaction (40) and variation in style of inquiry (17) raise 
troublesome questions concerning the meaning of scores once 
achieved. 
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Presumably the inquiry phase of the Rorschach is designed to 
discover the characteristics of the inkblot which prompted the sub- 


er, the subject will say, “It just looks like it 
: miner about where he started. And even if 
the subject does mention the color as playing a part in the concept, 


do we have any way of knowing whether the subject would have 
reported blood in the absence of color? How do we know it wasn’t 
the combination of form and shading that suggested a bloody 
thumb? The unfortunate fact is that we simply don’t know, al- 


ome aaa studies by Baughman (2) provide a better basis for 


and has tried to overcome 
ve inquiry than the usual 
asking many more ques- 
with inquiry immediately 
g until all 10 inkblots have 


es in the formation of the 
ies such as surface texture 
ganization activity, 15 with 
the single response such as 
ion, there are six scales deal- 
ocol as a whole. When one 


unt of informa 


; tion wer i t the 
tributes and th e available abou 


© correlates between these at- 


very nature of the 


— 
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respect to the determinants or global psychophysics of the reported 
percept, even a highly trained introspectionist would be hard put 
to verbalize accurately the relative importance of various inkblots 
characteristics in forming the percept. Since the greatest value for 
the Rorschach is claimed to be the study of psychopathology where 
the subject’s ability to introspect accurately may be seriously im- 
paired, there appears to be little real hope of obtaining the kind of 
information necessary to use many of the scales Zubin has proposed. 
Although Zubin’s system may not really increase the objectivity of 
scoring for the Rorschach, since it is comprised largely of five-point 
scales for recording clinical impression, his exhaustive approach 
immediately points out the fundamental weaknesses inherent in the 
standard methods of scoring. 

In addition to the fact that objective scoring for most inkblot 
variables cannot be achieved without the use of arbitrary rules, the 
standard Rorschach is inherently poor as a psychometric device in 
some other important respects. Providing the subject with only ten 
inkblots and then permitting him to give as many or as few re- 
sponses to each card as he wishes characteristically results in a set 
of unreliable scores with sharply skewed distributions, the majority 
of which fail to possess the properties of even rank-order measure- 
ments. One record with an R of 20 may be comprised of single 
responses to the first nine cards and 11 responses to Card X, while 
another may consist of two responses per card. Any of the usual 
Scores with the possible exception of form level will have quite 
different meanings in the two contrasting protocols even though 
the total number of responses is constant. Add to this the difficulties 
arising when R varies from less than 10 to over 100, and it is easy 
to see why most quantitative studies involving the standard Ror- 


schach yield confusing or negative results. ; 

In a general review of statistical methods applied to Rorschach 
Scores, Cronbach (8) has considered several ways in which the er 
founding effect of R upon most other variables can be reduced. 


(a) Computing percentage ratios of each variable over R; (b) remov- 
on techniques; (c) re- 


ing the linear effect of R by partial regressi | 
ducing the effect of R by. plotting fe variable against R an 
drawing a freehand line fitting the medians of the ale > e 
form of curvilinear partial regression); or (d) dividing et an née 
ple into a number of subgroups that are homogeneous wit A resp i 
to R before proceeding with any quantitative analysis of ott z oa 
ables. The usual procedure of computing percentage raos is higi 3 y 
unsatisfactory because of the crude metric qualities ot mos 
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ifferent in meaning due to 

different patterning of responses across the 10 cards, ; ; 
Recognizing the serious problems in the interpretation of gi 

when Risa variable, most clinicians make allowance for R in 


) goes one step further by trying 
ion so that three to five responses Wi 

(4) avoid the problem in 
st response to each card. ge 
m which to obtain scores, many o 
ates a whole host of new problems in 
actory standards of measurement. ie 
ing conditions and development of pr : 
the Rorschach to large groups at a tim 

represents another attempt to achieve more objectivity. one 
(32), Harrower (20), Sells (42), and others have demonstrated t he 
feasibility of group procedures provided one is willing to sacrifice 


certain aspects of the more unstructured, personalized individual 
Rorschach. The usual procedure i 
large screen for three 
responses in a standard booklet. T a a 

controlled, the subject is usually given a very simple, direct N 
concerning the role of shape, color, movement, and texture, an 

location is indicated by dr 


awing the outline of his percept on & 

miniature replica of the blot. } 

Most of the Scoring difficulties inherent in the standard Rorschach 

are aggravated still further by use of such group methods, Where 
one at least has t 

of verbalizations and individ 


terminants, increasing further the arbitrary nature of the system 
If one uses standard Paper-and-pencil aptitude tests as a mode 
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p er the most highly structured, psychometrically sound 
wen; ee mE would appear to be a multiple-choice test 
Sans of = ly standard instructions to permit its use with large 
en = ajeeks. Under pressure of screening demands during 
a! yee and others (20) developed a multiple-choice 
those th nw es the subject chooses from a list of thirty concepts 
on Lins which look best to him for the particular blot in ques- 
hes es _ of the 30 available concepts presumably indicate psy- 
= = p ogy while the remainder reflect normality. Harrower’s 
Nona, em of scoring is unusual and unnecessarily complicated. 
a m answers are arbitrarily weighted 1” for any concept involv- 
“a3 nan movement, 2” for any that represent a popular response, 
and “4” for those which involve color-form integration, and “5” 
abnormal answers is assigned weights 
ar arbitrary fashion. The total score 
for the concepts chosen is con- 
arbitrary weighting system. 

simpler multiple-choice form 
hotic records, four from 
The subject is asked to 
e the inkblot. Answers 
“1” for normal and = 


for Space responses. The set of 
Ka: from “6” to “9” in a simil 
ee summing the weights 
d in its meaning because of the 
en recently, O'Reilly developed a 
bee choices per blot, four from psycl 
select th records, and four from normals. 
ae. =; two concepts which best describ 
for in ine on a three-point system with 
Ei hotic. Almost complete separation of normals from psy- 
led cs was achieved in a cross-validation, although the neurotics 
only slightly higher total scores than did the normals. 
Mier interesting, objective approach utilizing the multiple- 
McRe: ormat is the concept evaluation technique developed by 
ae hh nolds (29). Using Beck’s list of good and poor responses 
cording to form level (8), McReynolds selected 25 good and 25 
p re concepts spread throughout the 10 Rorschach plates. The 
subject is shown the location of the concept and asked to indicate 
whether or not the inkblot looks like the concept. Generally given 
after a standard Rorschach as part of the testing-the-limits phase, 


McReynolds’ concept test yields an objective, scorable, reliable, and 
hich the subject can dis- 


un measure of the degree to wl ! ; 
ninate good from poor concepts. One of the main advantages of 
McReynolds’ test is the fact that the number of discrete stimuli 
we areas of inkblots) has been increased from 10 to 50 by 

reaking up the standard 10 Rorschach plates into smaller com- 
Ponents. This point is a highly significant departure from the usual 
Ipsative method of allowing repeated response to the same stimulus 
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and probably accounts for the satisfactory internal consistency 
(split-half reliability of .82) that McReynolds obtained. ss 
As Harrower (20) has pointed out, the highly structured multiple- 
choice versions of the Rorschach are no longer equivalent to the 
standard individual Rorschach except for the inkblots themselves. 
One could go a step further and question whether or not tests that 
have completely fixed response alternatives can even be considered 
projective techniques. In all respects they appear to be objective 
tests of perception which may have implications for the measure- 
ment of important personality traits. The course of development 


from an unstructured projective technique to a completely struc- 
tured objective test is complete. 


A New SoLurion 


The fundamental Question of how to develop psychometrically 
sound scoring procedures for responses to inkblots while also pre- 
serving the rich qualitative projective material of the Rorschach 
has been approached from a new point of view at The University 
of Texas.! The major modifications undertaken consist of greatly 
increasing the number of inkblots while limiting the number of 


, and extending the variety of stimulus 


colors, pattern, and shadings used in the original Rorschach mate- 


would probably tap essentially 
e classical Rorschach method. Special efforts 
ight » However, to develop materials which have 
high “pulling power” fi responses using small details, space, and 
color and shading attributes to compensate for the tendency to give 
d wholes as the first response to an inkblot. 


Such a test would have several advantages over the standard 
Rorschach: (a) The number of res 


relatively constant. (b) Eack 


esearch was given the writer by a Faculty Researen 
cience Research Council, Inc., of New York. More re- 
cently the research program |} 

1 


orted by a grant-in-aid from the Hogg 
Foundation for Mental Health, The University of Texas RE 
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view of recent experimental studies of color, movement, shading, 
and other factors in inkblot perception, would yield a richer variety 
of stimuli capable of eliciting much more information than the 
a. 10 Rorschach plates, And finally, (d) A parallel form of 

e test could easily be constructed from item-analysis data in the 
experimental phases of test development, and adequate estimates 
of reliability could be obtained independently for each major vari- 


able. 
The research to date has borne out all original expectations. Two 
Holtzman Inkblot Test 


ein alternate forms, A and B, of the E 
iWon: been developed, each containing 45 inkblots. Two additional 
s are common to both forms of the test and appear as practice 
blots before the others. Instructions to the subject are similar to 
those used in the standard Rorschach with the exception that the 
subject is asked to give just the primary response to each card, and 
a brief, simple inquiry is made after each response where necessary 
to clarify the location or determinants. Administration of the test 
is easier than the Rorschach, and the subject generally finds giving 
only one response per card is a fairly simple task. 

Six major variables are scored for each response, while a number 
of minor variables or qualitative signs are scored when deemed ap- 
propriate. The major variables were selected and defined according 
En the following criteria: (a) The variable had to be one which could 

e scored for any legitimate response. Variables which only rarely 
occurred were set aside for the moment. (b) The variable had to 
2° sufficiently objective to permit high scoring agreement among 
tained individuals. (c) The variable had to show some a prior! 
Promise of being pertinent to the study of personality through per- 
a And (d) each variable must be logically independent of 

ne others. Location, Form Appropriateness, Form Definiteness, 
Color, Shading, and Movement Energy Level were selected for in- 
ne study and provided the basis for item-analyses in the final 
selection and matching of inkblots for Forms A and B. 

Location as a variable was defined strietly in terms of the amount 
of blot used and the extent to which the natural gestalt of the blot 
was broken up by the response. A three-point weighting system 
= adopted with “0” for wholes, “1” for large details, and 4 
0 a areas, making possible a theoretical range of scores from 
Mi, scoring of color was based e 
cs cy or importance of color, including black, gr 

sponse-determinant. When the subject name 


ntirely upon the apparent pri- 
ay, and white, as a 
d the color in his 
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response, scoring was relatively simple. On rare occasions, a 
it was apparent that the response would have been highly impro 
, credit for color was given even 


ent uses of shading as a 
schach, no such differenti- 


rare, only a three-point scorin 
retical range from 0 to 90, 
The scoring of movement is linked 


r how it is scored, In the Klopfer system (24), 

for example, “airplane” and “bat” present difficult problems, Can 
i f n an airplane does fly, 
there is no ovement of its parts and no movement relative to any 
frame of reference unless landscape is added. Is “bat” to be scored 
FM for anim while “airplane” is scored Fm for inani- 
th Concepts are r 
uniquely different r 
onfusing from 


er than th 


scale was adopted var 
for movement, through Static, cas 
a weight of “4” for vi 
Movement Energy 

Different authori 


Working independ- 
i Pts culled from inkblot responses, 
five psychologists placed them in rank order with the most form- 
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bio concept at the top. The independent sets of ranked con- 
lise © nn then merged to yield an overall rank order for the entire 
a 3 u : rs points were chosen so that five levels of form definite- 
Sr conl e distinguished. The resulting set of examples served 
Sa coring manual, with a weight of “0” for the most indefinite 
Oncepts, such as anatomy drawing, squashed bug, or fire, and a 
weight of “4” for the most definite concepts, such as Indian chief. 
violin, or knight with a shield. Form Definiteness has a theoretical 
range from 0 to 180. 
hen orm Appropriateness, the last of the six major variables, is by 
ery nature a subjective variable, requiring extensive preliminary 
work to make scoring reasonably objective. And yet, it is this very 
subjectivity which gives the variable great theoretical importance. 
Beck (8) recognized the likelihood that goodness of fit of the concept 
to the form of the inkblot would be closely related to degree of 
Contact with reality and undertook a major study of form level that 
las proved to be one of the most valuable contributions to the 
Rorschach, Considerable effort was spent in arriving at acceptable 
standards for scoring Form Appropriateness. Different responses 
to each inkblot were listed separately for each location and rated 
independently by at least three judges. A seven-point scale was 
used with “0” representing extremely poor fit. Although there 
was good agreement of judges in most cases, a final judgment for 
each response was reached only after full discussion in conference. 
He resulting manual provides a guide to the scoring of Form Ap- 
p Topriateness on a three-point system with zero for unusually poor 
‘orm and “2” for unusually good form. Form Appropriateness can 
range theoretically from 0 to 90. 
$ e agreement among indepen 
sample of 46 records proved in gener 
moment correlations of .99 for Location, Form Definiteness, and 
Movement Energy Level, .97 for Shading, .95 for Color, and .91 
tor Form Appropriateness. Good estimates of reliability based upon 
Internal consistency were obtained by using Gulliksen s matched 
random subtest method (18). Correlations ranged from .80 for Form 
Appropriateness to ‚91 for Shading. All six variables proved to be 
Teasonably normal and continuous in distribution. Studies are now 
up to determine the correlations pepe em Aand B 
1 several time intervals and populations of su jects. 
Once the standardisarion of the Holtzman Inkblot m is Te 
Plete, it should be possible to develop specialized multip A re 
versions of test for measuring variables of particular interest. Sey 


dent but well trained scorers for 
al to be very high: product- 
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mour Fisher and Sidney Cleveland have already had some success 
in developing a series of multiple-choice items to be used with 40 


of Holtzman’s inkblots which yields a measure of their Barrier Score 
(13). The particular inkblot 


earlier item-analysis data so 
by three fairly acceptable c 
sponse (such as “a knight in armor”), 
tion response (such as x-ray”), 
as “flower”). The subject was as 


o 60 college students by Fisher and 
etween the two sets of Barrier Scores 
orrelation, coupled with the fact that 
n the multiple-choice test was much 

ach and was more normally shaped, 
suggests that the multiple-choi 


j o ambiguous stimuli, has encouraged an 
almost unbelievably wide range of assessment techniques under 
the rubric of projective m In focussing upon quantitative 
ectivity as measured by repro- 
3 n Important problems concerning the mean- 
ing of projective responses has been deliberately side-stepped. Con- 
cepts of validity and their empirical determination, examiner-sub- 
ity of response across different populations 

t with only tangentially if at all. 
any, of these many pro- 
at the same time, par- 
itation of the projective 
ality. While not neces- 
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unfortunate and bewildering array of inadequate quantification 
characterizes most projective techniques when there is pressure upon 
the projectivist to conform to the rigorous statistical standards of 
psychometric theory without concomitant pressure to revise the tech- 
nique itself. A major challenge to psychologists interested in the 
objective assessment of personality is the development of psycho- 
metrically sound personality tests from available projective devices, 
a point made by Thurstone (45) 10 years ago which still stands to- 


day. 
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An Approach to the Objective 
Assessment of Successful 


Leadership u 


BERNARD M. Bass 
Louisiana State University 


un EEE 


Aas 
AD I BEEN PRESENT at the cre- 


ation,” Alphonso the Learned (1221-1284 A.D.) quipped, “I would 


have given some useful hints for the better ordering of the universe,” 
Alphonso could have made the same comment today about the 
chaos in leadership theory and research, 

The construction of typologies o 
bring some order, for understandin 
begin only after some of its import: 
field of leadership, we were abun 
guesses, Fisher (27) listed some 1 


9 distinct ways of typing leaders 
revealed in the literature from 1915 to 1948, for leadership has been 


* This work was aided by funds from the Louisiana State University Council on 
Research and Contract N7 ONR 35609, 
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ing in the fields of organization, group behavior, and industrial 
psychology. A second cluster of definitions emphasize the leader as 
focus of attention, as representative of the group. Again, there al- 
ready is available a widely used concept, esteem, the value of 
members to the group regardless of their position. 

The leader often is defined simply as anyone who engages in 
leadership acts. But, what is a leadership act? 


LEADERSHIP DEFINED 


Agreeing with Bowman (21) and Gibb (33), I consider leadership 
an interaction between members of a group. Although the groups 
are usually face-to-face, this is not considered a necessary condition 
for the occurrence of leadership. It is rather a usual condition. 
Leadership occurs when one member's behavior is concerned with 
changing another member's behavior. 

This definition is close to those of Gurnee (35) and LaPiere and 
Farnsworth (46) who defined leaders as agents of change; as persons 
Whose acts affect other people more than other people affect them. 
lt also conforms to Smith’s (65) conceptualization of controlled 
pteraction, and with those defining leadership as influence and as 

ehavior making a difference among groups. 

A may try to change B’s behavior; this is attempted leadership. B 
tpa actually change his behavior as a consequence of A’s attempt: 
me is successful leadership. B’s change may result in B’s own goal 
attainment; this is effective leadership (38). 

i Mis conceptualization differs from Hemphill’s (39) mainly in the 
hill 3 of sd ag included in the meaning of leadership. For Hemp- 
consi ris ership acts are limited to those concerning alteration of 
don I ent patterns of interaction within the group. Excluded are: 
a eect task analyses, expressions of attitudes, information giving 
rejecti ng, Tequests of suggestions, proposals, and acceptance or 
Hier g of earlier suggestions. I have chosen a much broader defi- 
tegard m of these acts excluded by Hemphill generally will be 
the ‘ced as leadership although it will depend on the function of 

Specific act. 
a an = the ways in which A can change B’s behavior? f A can 
terms, A ny S both in strength and direction. Stated in different 
of these mat age what B regards as his goals and the importance 
ing law — a pointing out the challenge and rewards of study- 
intere sey professor may arouse or strengthen in a student strong 
st in a law career. A Caesar or a football coach may arouse his 
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men to fighting pitch with a speech before battle. For us, these 
are acts of leadership. Lowering of motivation may also be in- 
cluded. A baseball manager providing strong. fatherly reassurance 
to his “too-tense” team near the end of the race for the league 
championship is engaged in a leadership act. ; 

A can strengthen or weaken relatively responses of B to various 
stimuli, A can change B’s behavior by reinforcing certain habits 
or reducing the strength of these tendencies, Included among these 
habits are abilities (habits where we evaluate the response in terms 
of success in goal attainment) and attitudes, faiths, and beliefs 


(habits where the response is towards what stimulated it). Another 


way of describing the same phenomenon is to state that A can alter 
B’s abilities to cope with his 


immediate problem. Concretely, a 
sales manager who informs his subordinate of the necessity of sub- 
mitting accurate, clear, daily reports of his activities and then 
occasionally compliments the salesman on his reports when done 
well and criticizes items when not presented correctly, is strength- 
ening a habit pattern to submit desired reports. This is leader- 
ship. A counselor of a group in therapy who fails to “reward” a 
neurotic member for his emotional outbursts nor show alarm, may 
be serving to reduce the strength of a behavioral tendency by the 
neurotic to exhibit such behavior, Therefore, the counselor is dis- 


playing leadership. 


ARBITRARY RESTRICTIONS ON THE MEANING OF LEADERSHIP 


Changing the immediate needs of B 
motivation are not the onl 


example, altering the inte 


and B’s ability to satisfy his 
y ways of modifying B’s behavior, For ex- 
grity of the central nervous system of the 
organism, B, via Surgery, injury, drugs, etc., will modify B’s be- 
havior. Also, B’s behavior may be changed by changing the cir- 

terations of the integrity of an organ- 


aching and psycho- 
ship processes, must 
tship, because they 
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havioral contagion, influence, followership and vicarious experience. 
Space limitations prohibit discussion here. 

Titles and office-holding involve much more than leadership. It is 
necessary to distinguish between the leadership displayed by the 
foreman from all the behavior of a foreman, for foremen shuffle 
papers, compute output figures, check inventories, and operate 
equipment and so on. If leadership is defined as anything done by 
one who holds an office or by one who is designated a leader, we 
would find ourselves trying to develop principles encompassing 
almost all human behavior. 


MEASUREMENT OF SUCCESSFUL LEADERSHIP” 


Table 1 illustrates many of the possible ways of measuring suc- 
cessful leadership. 


TABLE 1 
Some Ways or Assessıng LEADER BEHAVIOR 
Assessor 
Mod Observations, 
ee erof Other Members Records and 
en essment Self of Groups Instruments 
istorical Autobio- Recollection Historical Documents; 
graphical Biographies, Case 
Histories, Interviews 
Selecti 
elective Announcement Sociometry Nominations and Elec- 
of Candidacy Voting tion Results 
Job Placement Usurpation Cooptation Appointmentsand other 
Appointment Administrative Acts 
i to Office 
‘co Self Superior- Observed Roles Played; 
eck Lists Buddy-Peer- Frequency of Acts; 
` Subordinates Test Results 
rojective Projective Projective Thematic Analysis of 
Sketches Sketches Content of Essays 
Eff 
ect on Others Example: Satis- Observed Changes in 
faction with Groups 


Group Effort 


Overt Changes in Mem- 
bers and Groups 
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Historical 
Naturalistic case-histories are a common approach employed by 
anthropologists, to assess 


ial groups, and volunteer 
cess is intensive interview- 


(60) or Burleigh Gardner (32). 

Analyses of case histories, as such, were 
(1) and Brown (22). Examination of bio 
was exemplified by Cox’s study (25) 
geniuses. Less formal attempts of this sort began with the first 
historical writing. For example, Plutarch’s Lives paired Roman and 
Greek leaders to assess each member of a pair in comparison with 
the other, 


employed by Ackerson 
graphies to assess leadership 
of the biographies of 300 


Selective 


Stogdill (66) listed 28 major studies assessi 
of associates suggestin 


(44). Many of these 


ng leadership by choice 
§ as most exemplary the work of Jennings 
teem, not leadership, per 
much prominence only in the past two decades 
found in Terman’s 1904 
ations by observers outside the 
group have also been commonly employed (e.g., Burks, (23)). 
Studies of election results have b itical scien- 
tists, public opinion analysts, and r tention 
here is often centered on voting behavior rather than on leader 
behavior stimulating the voting. However, Sanford’s (61) work 
illustrates how political elections are related to the personality needs 
of the voter and the stimulus Properties of the office-seekers, 
Job Placement 
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Ratings and Check Lists 


Most common at present is the use of specially developed rating 
procedures and check lists for studying leader behavior. For ex- 
ample, the work of Stogdill and Coons (67) and associates illustrates 
the factorial approach to construction of behavior check lists be- 
ginning with empirical surveys and concluding with theoretically- 
oriented, factorially independent scales of leader behavior. These 
Scales have been used to describe leaders by superiors, associates 
and subordinates as well as by leaders themselves. The widespread 
use of peer ratings or buddy ratings in the military services are an- 
other example of ratings to assess leader behavior (42). 

_ More objective and theoretically-based in their mutual construc- 
tion are the categorizations of roles by Benne and Sheats (17) and 
the interaction process analysis of Bales (4). In the same class is 

helen’s (71) description of a method of categorizing behavior in 
groups based on Bion’s work-emotionality concepts. In Benne and 
heat’s methods, observers note which of many defined roles are 
played by the various members of the group. For example, suc- 
cessful discussion leaders of initially leaderless discussions reliably 
have been observed playing the roles of initiator-contributor, opin- 
10n-giver, elaborator, compromiser, orienter-evaluator, energizer and 
encourager (7). Bales’ procedures reduce subjectivity further. Each 
action by a member is categorized in one of twelve types falling into 
Our areas, The frequency each member exhibits each type of be- 
1avior can be measured with high observer reliability. Again, lead- 
ers in initially leaderless discussions are found to exhibit certain of 
lese behaviors with high frequency particularly those in the areas 
of attempting answers and positive socio-emotional responses. 


Tests 


Early attempts to describe leader behavior were characterized 
Y the “armchair” listing of traits found among successful leaders 
Y the leaders themselves, by observers, or by surveyors of the 
cader: ship literature. Another similar indirect approach was based 
te administering personality inventories and other psychological 
tests to designated leaders inferring leader behavior from the traits 
ound to predominate among the leaders (66). 


Projective Techniques 


Torrance (73) has presented ambiguous sketches of leader-fol- 


Ower situations to groups. The stories told by the groups appear to 
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provide reliable indices of the overt effectiveness of leadership 
within the group. Indirect information about leader behavior has 
also been gathered by administering the Thematic Apperception 
Test to business executives (40). My Job Contest (26) involving the 
analyses of themes of essays by workers at General Motors sub- 
mitted as contest entries represents still another projective approach 
to studying leadership and behavior in groups. 


NEED FOR OBJECTIVE ASSESSMENT 


If we are primarily concerned with understanding, predicting, 
and controlling leader behavior, as such, it becomes desirable to 
develop ways of “sensing” the behavior itself, On the plane of 
observables (51) are needed the similar sense impressions corre- 
sponding statistically to the constructs. The bridge from the theoret- 
ical model to the protocols of the laboratory will be firmest if 
subjective impressions do not intervene between the theoretical 
constructs and the “world of facts.” The facts about behavior are 
vague, They are doubly difficult to deal with when gained “second- 
hand” from ‘observers or group members’ reports. We desire opera- 


tions definite and quantitatively precise if possible; operations 
repeatable and objective. In order to study leadership experi- 
mentally, we should like to 


r anchor our definition in leader behavior 
measured sufficiently objectively to avoid being tampered with by 
observers’ or participants’ biases—unless we want to study biases. 
Dangers of Subjectivity 

Reliance on observers, participants, or subjects’ mediation of the 
taw un to be give rise to dangers. Viteles (76) illus- 

ates the error possible in dependin on istor i jew 
studies. The “Hawtho p A peA i 
control of behavior in a ban 
tion which demanded confo 


The workers did ance to norms or common standards. 
1d not reveal to interviewers their animosity toward 


management because parts were re-engineered when time studies 
were in error in favor of the workers, nor did they report their 
stretchout” of work which was due to fear of bein | a ff in the 
depressed economy, Re 
Even categorizing observed behavi 
the same 13 categories of behavior ( 
cation), Bell (16) studied leader beh: 
in a northern university and 10 gr 


ors falls short of desire. Using 
a modification of Bales’ classifi- 
vior in 10 groups in a laboratory 
Ups in a southern school of the 


a 
(0) 
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same size. Significant differences appeared in the categorized be- 
havior of emergent leaders in each location. Some of this may have 
been due to overt behavior differences, but some of it was probably 
a function of observers. Again, Borgatta (20) asked subjects to 
write how they would react to the Rosenzweig Picture Frustration 
Test; then the subjects showed how they would act; finally, they 
were placed in the situation. There were no significant correlations 
among the three methods of response. Similarly, Halpin (36) re- 
ported little relationship between a leader’s beliefs about what he 
should do and what his subordinates said he actually did. The 
picture is complicated further by the observation that members of 
a group when stating their own opinions tend to compromise what 
they privately “sense” and what they perceive to be the group 
opinion on the matter (34). Mencius (372 B.c.-289 B.C.) recognized 
the difficulties of depending on judgments of leader behavior by 
their immediate superiors alone. In paraphrase, his advice to heads 
of state was: 


“When all those about you, the ruler, say that a man is talented, do not imme- 

iately rush to promote him, Only after his subordinates say so also should you 
examine him more fully as a candidate for promotion. In the same way, do 
not rush to demote a man on the evaluation of his superiors alone. 


Once we have a rationale for understanding behavior, we must 


ave measurements to promote and communicate our understand- 
Ings. 


seas the language of number sometimes provides a certain minimum standard 
of Integrity in communication, without which cooperation of human beings on 
Some kinds of subjects is almost fruitless . . - Lord Kelvin declared, if you 
can't measure it, you don’t know what you are talking about. (78, p. 366, 368). 


OBJECTIVE MEASUREMENT OF INFLUENCE 


Many of the early social psychology experiments on suggestibility 
Provided objective assessments of leadership. The leader usually 
Was the investigator; the followers, his child subjects. For example, 

riplett (74) pretended to throw a ball into the air. Then, he 
determined the percentage (about 50 per cent) of fourth to eighth 
grade children who actually saw the ball go up and disappear. 
inet (18) assessed the susceptibility to influence by having subjects 
raw lines indicating their judgment of the length of stimulus lines. 
fa increased in size up to a certain point. The ee sae 
se the suggestible subjects continued to increase even after the 
muli presented by the leader, Binet, did not increase. 
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In more recent years, the classic studies by Sherif (63) and Asch 
(8) are similar illustrations of objective approaches to the study of 
influence. Sherif showed how subject’s reports of the movement of 
a pinpoint of light (actually stationary) could be altered by his 
hearing other subjects’ judgments of the same autokinetic effect. 
Asch found that some subjects could be made to declare the shorter 
of two lines was actually longer, if all other members of their group 
(all experimental “plants”) declared such was the case. 

Objectivity in assessment of group products also has been com- 
mon in social psychology. For example, Mayer (47) and many 
later investigators such as Weston and English (77), compared the 
speed and accuracy of performance of children on selected tasks 
when working alone and in the presence of co-workers, finding that 
presence of others facilitated performance. In recent years have 
come comparisons of supervisors of more productive departments 
with supervisors whose departments are lower in productivity or 
other objective indices of group performance (45). 

Studies of communication nets initiated by Bavelas (15) are an- 
other example of the development of objectivity in studying in- 
fluence and leadership. All communications between members of 


groups are restricted to the passing of symbols or notes. Objective 
analyses of who passes what to whom are the basis for testing 
hypotheses, 


‘remoteness from reality.” The 
an the movement of five men across a “mined” 
of solutions were equally good at all four levels: 
n, photographic presentation, miniature scale 
a ng manipulation, and a scale model allowing ma- 

A review of validi 
concluded that observed success as a le 
ficial, brief situation correlated with ¢ 
of the same persons in real life (6) 


ty studies of the leaderless group discussion 
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analyses of the leadership positions and included initially leader- 
less emergencies, leaderless small job management, as well as com- 
bat and reconnaissance leadership where the examinee was 
designated as leader in the situation. Test performance predicted 
merit as a noncom (r = .46). Specific acts of effective and success- 
ful leadership were observed and recorded. The average agreement 
among observers’ reports was .83. 

Maximum control of experimental conditions has been achieved 
where the group, itself, is simulated. Each member is stimulated 
in the same way while he believes he is behaving as a member of 
a group. Typical of such studies is one by Raven and Rietsema 
(58). Members of the simulated groups are separated. Each is told 
he is performing a different aspect of the task, but all actually do 
the same job. Standard notes are sent to each subject although he 
thinks, they come from the other members. Every subject thus 
receives the same “group” experience. 


AN OBJECTIVE APPROACH To ÅSSESSING SUCCESSFUL LEADERSHIP 


Change in member judgment as a result of interaction with others 
has been studied on a number of tasks. For example, Jenness (43) 
examined the changes in the judgment of the number of beans in 
a bottle, Asch’s (3) and Sherif’s (63) techniques, mentioned earlier, 
are similar examples, Timmons (72) appears to be the first to have 
used the differences in correlations among ranked judgments to 
quantify the effects of group influence. He found the accuracy 
in ranking solutions to a problem (as measured by the correlation 
of subjects’ judgments with the correct judgments) was greater 
among subjects given the opportunity to discuss the problem with 
Others. Preston and Heinz (56) and Hare (37) used the correlations 
among judgments by members of a group to measure stability of 
judgment, initial or final agreement among members, and degree of 
acceptance of the group decision. Talland (64) went one step 
“urther, finding that the correlation between a member's initial 
Judgments and the final group decision was higher among those 
tated as leaders, 
. hese within-subject correlational procedures offer a way of ob- 
Jectively determining the successful leadership of each group mem- 
Hinge well as related indices of group behavior. They enable d 
é nd objectively how much each member of a group change 

very other's opinion. At Louisiana State University, methods have 
een standardized as follows: On each of a series of test problems, 
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a group of subjects privately rank order their initial a. me 
the true order of familiarity of five words. Or they may be aske 
rank five cities according to size of population. Or they may have 
to decide on the order of merit of solutions to problems in human 
relations case histories. Then, they carry on a discussion to reach 
a group decision. Finally, they privately register their own rankings 
again. 

Three measures of successful leadership 
tive—are derived from the correlations be 
before and after discussion: 

Successful public leadership of a member is how much more the 
group decision correlates with his, rather than other members’ initial 
decisions added to how much the final group decision is like the 
designated member’s final ranking compared to how much he had 
disagreed with others initially, 

Successful private leadership is how much less a member changes 
his rankings than do other members added to how much more the 
other members’ rankings correlate with his rankings after discussion 
than before. 

Relative success 
of other members 
member com 
nated memb 
bers. 

The total amount of absolute public or private leadership turns 
out to be algebraically equivalent to the coalescence of a group 
(how much members increase in agreement with each other). Total 
relative successful leadership of all members combined is always 
zero. 


—public, private, and rela- 
tween members in opinion 


as a leader is how much more the final decisions 
correlate with the initial decision of a designated 
pared with how much the final decision of the desig- 
er correlates with the initial decisions of the other mem- 


Methods of Data Processing 
Earliest data coll paper and pencil (8). Subsequent 
to these initial analyses, data have been collected either by asking 
subjects to register their Opinions on specially prepared IBM mark 
constructed analog computer 

IBM 650 and auxiliary equip- 
ions. If th ter is 
used, the experimenter reads the corr A e tay one 


| elations directly from an 
ly following each problem, 
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MEASUREMENT RELIABILITY 


. Table 2 shows the results for four analyses of the split-half re- 
liability of the three measures of successful leadership based on per- 
formance of 10 or 12 problems, or about an hour of testing. 
Motivation of subjects was varied by selecting for the highly moti- 
vated sample those rating themselves as strongly interested in 
entering advanced ROTC, then collecting the leadership measures 
as part of an entrance screening examination. The samples of 60 
subjects of medium and low motivation were selected by the same 
= The motivation questionnaire was validated in several ways; 
Palast te by finding that it accurately predicted formal applica- 
fn o advanced ROTC, as well as by a tendency to appear for the 
ea Day The fourth sample of 95 subjects was composed of 
so rarily selected night school students under no particular ex- 
sic motivation to perform well. 
Powers for one reversal, a consistent trend emerged. The lower 
ee extrinsic motivation of subjects, the more consistent the leader- 
in P measurements. To maximize consistent individual differences 
mid een leadership, it appears necessary to examine subjects 
ha er no extrinsic compulsion to perform well (11). Two or three 
urs testing would raise reliabilities to where the measures could 
e used diagnostically, 


TABLE 2 


Correcrep SrLit-HALF RELIABILITIES 
OF MEASURES or LEADERSHIP AS A FUNCTION OF MOTIVATION 


Motivation 
Measure High Medium Low Low 


N=135 N=60 N=60 N=95 

P K=10 K=10 K=10 K=12 
ies Successful leadership 82 50 59 55 
el successful leadership 80 44 75 52 
ative successful leadership AS 29 61 64 


N = number of subjects 
= number of problems administered 
For 133 df, p Z .01 when r = .22 
For 93 df, p Z 01 when r = .26 
For 58 df, p Z .01 when r = 33 
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MEASUREMENT Vatiprry 


This examination will be similar in some respects to an earlier 
publication reviewing validity studies of the leaderless group i 
cussion technique (6). Again, construct validity will be considered. 
The construct validity of the three 
successful leadership will be compared 
gested that: 

1. Ability to solve th 
those with higher 
ful leadership. 

2. Esteem among associates shou 
higher assessments, 


8. Those with higher successful leadership scores should be ob- 
served by others as exhibiting more successful leadership, 

4. Those who attempt more leadershi 
assessed successful leadership, 


€ group’s problems should be higher among 
assessments of what was purported to be success- 


Id be higher among those with 


p should exhibit higher 


Ability and Successful Leadership 
Table 3 shows the corr 


ence test; (2) by their average initial ae 
i he words of 10 problems; 
academic standing through the sophomore 
year. (The effects of sy erences in motivation and status 
ions within subsamples and then 


ILITY AND THREE OBJECTIVE 


EASURES OF Successruz LEADERSHIP 


, Successful Leadership 
Measures of Ability Public Private Relative 
ACE .14° 14° 29°° 
Initial Accuracy .21°° eo '3g°° 
Academic Average .12° 12° ‚19° 
PZ.01 
PZ.05 


NE X order Correlation between his initial 
larity of the set of 5 fa- 
miliarity of the words, 7 Words and the correct rank order of 
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For all 255 subjects, a significant correlation (at the 1 per cent 
level) was found between the ACE and relative successful leader- 
ship. Relations with the other leadership measures also tended to 
be positive but somewhat less significant. Consistent with the 
higher reliability of the leadership measures, the correlations (not 
shown) were higher when subjects were lower in motivation. Intel- 
ligence seemed especially important to relative success as a leader 
among groups where members were more equal in status (r = .41). 

Average correlations of .21, .21, and .36 between initial accuracy 
and the three measures of successful leadership—public, private, 
and relative-were found for the 255 subjects. Again, the correla- 
tions were higher (.34, .40, and .46) when motivation was low. 

Academic performance showed the same pattern of relationships 
with successful leadership but the average correlations, although 
Significant because of the large number of cases, were not much 
above zero, 

Generally, the results were consistent with the proposition that 
ability to solve the group's problems should be higher, (but not too 
much higher) among those with higher assessments of leadership 
Supporting the contention that the assessments truly were measuring 
eadership. The relations are strongest between ability and relative 
Successful leadership in contrast to the other two measures of suc- 
cessful leadership. 


Estee; , 
steem and Successful Leadership 
Table 4 shows the correlations between the subjects’ successful 
eadership during testing; their esteem as rated by their ROTC 
goctical Officers based on observations over a two-year period; and 
A esteem as measured by their peers during the situational test- 
8. 
The members of each group tested rated each other on a five 
Point scale on how much loss to the group's effectiveness would be 


TABLE 4 


Corretations BETWEEN EsTEEM AND THREE OBJECTIVE 
MEASURES OF SUCCESSFUL LEADERSHIP 
Successful Leadership 


Measures of Esteem Public Private Relative 
i o © 
Tactical Officer’s Evaluation a3 14 A 
Esteem by Peers 2a SI"? 36 
°° pz 01 


°PZ.05 
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incurred if a particular member left the group. A member's sai- 
rating provided a measure of his self-esteem. The average rating 
others in his group assigned him provided the measure of esteem 
by peers. = i aoe ia 

Low positive correlations, again higher where members were i 
in motivation, were found between the three leadership measures 
and the Tactical Officers’ ratings. The correlations were lower than 
commonly found when observers’ ratings of successful leadership 
in initially leaderless discussions have been compared with Tactical 
Officers’ ratings (6). Part of the difference may be due to the lower 
reliability of the objective data used here, 

Esteem-by-peers was significantly related (.28, .31, and .36) for all 
ee measures of successful leadership. They 
rse relation with motivation. Thus, the cor- 
responding correlations (not shown) for only those subjects of low 
motivation were 82, .42, and .54, The results are consistent with 


earlier positive correlations between esteem and the objective meas 
ures of public and private success in leadership among 95 night 
school students (10). 


Again, the findings Suggest that the leadership measures have 
validity as such, Of the three measures, relative success as a leader 
Seems most valid as judged by its higher relation with esteem-by- 
others, 

Rated vs. Actual Success 


At the end of 


about whether or 


as a Leader 


the test, each member of a group checked items 
not every other member had attempted to moti- 

ot been ignored, The items were: aroused the 
Iked about the importance of group success; 
ns into operation; changed desires of others; 
made others feel free to take part; inspired others; increased general 
ty; encouraged others to Participate; and supported 


; Coordinated others’ ac- 

i acceptable to others, (See Stog- 
dill and Coons (67) for a detailed iscussion of initiating structure). 
ion items checked by all other 
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memł at 

Ticl pa ct ta of a subject's behavior provided the sub- 
i Seti en = m motivator. Similarly, the average number 
subjective = te a ed for a subject by the others provided a 
corela ione oe > a the yon s success as an initiator. Average 
eet. ra 2 5 = 28 were obtained between rated success 
Average en ” ig measures of actual success as a leader. 
er oe ce eel be u mis ee ‚30 were found between rated 
igh erh E r and the three measures of actual success. 
baar zation re uced the relation for relative success as a leader 

or the other objective measures. 


Bie cn vs. Successful Leadership 
a Pi aa en measured by the average time in seconds 
Of .71, 91 gr id tibiae during each discussion, exhibited reliabilities 
successful as, er with decreasing motivation (11). In order to be 
Positive co ena a member of a group must attempt leadership. 
obtained me DRE AUOD is expected between the two independently 
Sucöessial | asurements if both are truly measuring attempted and 
ul leadership respectively. 


Signi 
gnificant correlations of .17, .15, and .28 were found for the 


S j " 
t — ts. Again, probably because of higher reliability among 
and .38 y Bee motivation, slightly higher correlations of .19, .19, 
vere obtained with the 60 subjects of lower motivation. 


Constr E 
ruct Validity of Measures of Successful Leadership 


Th 
u cae the construct validity of the measures of 
thease atd = aip. by examining the correlations between the 
Success as leaden a ility of 255 subjects, their esteem, their rated 
The initie] ns and their attempted leadership. 
Some extent th curacy and intelligence of the subjects predicted to 
Measures were e m as leaders, as expected, if the leadership 
deir peers in tl ruly measuring leadership. Esteem of members by 
Me ivete ia group discussions and by their tactical officers in 
as expected 3 so positively related with the leadership measures, 
cess as a lead 5 ated success correlated significantly with actual suc- 
as leaders att r and as expected, those with higher measured success 
enerall ea more leadership. 
Ower in B Te correlations were highe: 
Of the eg aes This conformed to the 
he es Ne measures was higher when motivation was lower. 
were also higher generally for relative successful 


r when subjects were 
fact that the reliability 
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. ' m 
leadership than for the absolute measures, public and private su 
cess as leaders.? 


TESTING HYPOTHESES ABOUT LEADERSHIP 


The purpose behind developing the objective measures of come 
ful leadership was to test by suitable experiments a in i 
hypotheses about leadership generated by a theory of leaders p 
have constructed. Here are some of the results: esful 

The theory suggested and results indicated that more succe 


leadership was displayed in 5] groups as problems grew more 
difficult regardless of other factors, 


$ Support a variety of other propo i. 
tions concerning successful leadership, Many of these are well- 
documented by other investigators examini 

i are not, So 
others are not. 


ts to be filed for £ 


was Te a 
me are “obvious,” E 
Some seem immediately 
uture reference. 


able,” “common-sense;” 
applicable; others are fac 


Further Evaluation 


g successful leadership is relatively simple 
compared to other sim; jective techniques, For example, in 
one such similar method (75), 


Snificantly apart initially, b igni 
apart finally. While the definition of le 


2 For a more detailed report on the 


Construct validity of the three measures, refer 
to (31). 
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identical with the one described in this paper, the method of meas- 
urement appears more complex and expensive. 

Some other advantages of our method of studying behavior in 
groups by measuring the correlations of opinion ‘before and after 
interaction include the fact that the scores relate immediately to 
definitions of the theory of leadership and the relationships found 
among the measurements may be used to examine hypotheses 
generated from this theory. Also, the measures are continuous, and 
can be defined in algebraic notation. Moreover, each trial, provid- 
mg a single measure, is short and self-contained, permitting ap- 
plication of repeated measurement designs. Again, the procedure 
pl generalized in that the problems presented to the groups 
1 n be drawn from almost any type of subject-matter requiring 
Cecision-making or the making of ranked judgments. The measure- 
ments, in turn, while using widely varying content for problem- 
solving, will remain directly comparable. Subsequently, the outcome 
rink on the relations among the measures yield generalizations of 

gnificance in the study of group phenomena, as such. 

Kian group interaction studied need not necessarily involve oral 
ie ssion or even face-to-face contact. All communications among 
a Cnibers could be by written messages or by any devised symbol 
y The without any loss of the effectiveness of the technique. 
eee ee lend themselves to both digital and analog high-speed 
wall cess er analysis. It is possible to proceed directly from the data 
medi lon session to actual machine processmg without any inter- 

= late clerical work. 
nk Process is “artificial,” or unnatural and restrictive. But for 

he ge it may be worth the loss in ee i A in 
Which ean is = like the Bales ss ie en ing i y” a “ 

itions, Th used to study groups operating U k r ‘a 
such nat e method can only be applied as ` tes or en an 
method ural groups. The natural groups can e studie 

is introduced to them as a screening examination or a team 
Work test, 
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IX 


An Actuarial Approach 
to Clinical Judgment’ 


Wituram A. Hunt 


Northwestern University 


Dose THE past century the psy- 
chological approach to judgment has taken two divergent trends, 
with the adherents of each often in open conflict. One of these 
approaches, firmly associated with the experimental tradition, and 
producing orderly, repetitive data that lend themselves to nomo- 
thetic treatment I shall call the actuarial. The other, firmly an- 
chored in clinical practice, and producing individual data, highly 
unique in their nature and lending themselves to idiographic treat- 
ment I shall call the intuitive. Rather than accept the current view 
of these as qualitatively different and irreconcilable, I shall take 
the position that they are merely the opposite poles of a rough 
Continuum, a quantitative continuum marked by the clarity and 
Specificity with which the stimuli are defined, by the degree to 
which the judgmental setting is standardized through careful con- 
trol of the known pertinent variables and the elimination of extrane- 
ous cues, and by the provision of uniform modes of reporting or 
response that lend themselves to convenient mathematical treat- 
ment. 


ACTUARIAL TREND 


The actuarial trend in the field of judgment arises in the early 
measurement of sensory mechanisms with Weber, Helmholtz, Fech- 

° This study derives from a larger project subsidized by the Office of Naval Re- 
search under contract 7 onr-450(11) with Northwestern University. 
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etc., and shortly develops into the full stream a destul pp- 
ce we 4 iformities of judgment < 

i today the lawful uniformi i 

chophysics where lay th i ntl neni 
betig uncovered and investigated. The investigations are g 


merely in the traditional areas of sensation and perception but in 
the complicated fields of affectivit 


y, aesthetics, social attitudes, sies 
(29) and even in the field of clinical practice as our data to follow 


decision theory, and probability 
is source. Work in this classical 
e of physical stimuli, about which 

ical sciences have taught us much. 
A much we want of what, and can 
subsequently with some accuracy. 
ere is always sufficient error vari- 


a to raise doubts concerning the cgi 
plete identity of our repeated trials. That this margin of error o 


variability is minor and within acceptable limits does not ne ae 
its existence, Note also that as we move from physical materials 
to such complex stimuli as art objects, social situations, or — 
phrenia verbal responses, the margin of error rises and er 
in increasing variability of judgment. Yet our data remain wi 


limits of communality that make it possible to treat them nomo- 
thetically rather than idiographically. 


ions in which psychophysical ee ona 
atory ones, highly artificial and with all me 
. Very explicit and meaningfu 
© or observer to control his re- 
are clear, understandable, and bao 

> “heavier” or “lighter,” or the simple 
numerals of so; 


me quantitative scale, Ag 
in greater or lesser 


or even inversions 
the question, “Did 
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Intuitive TREND IN JUDGMENT 


The intuitive trend in judgment has as long a history as the 
actuarial trend albeit not as scientifically respectable a one. It 
comes down to us through the Germanic Geisteswissenschaftlich 
approach, through such cultural historians as Dilthey and Spengler, 
and through Spranger, the student of personality, and his concept 
of Verstehen or understanding. It culminates today in clinical psy- 
chology and psychiatry in what we call clinical intuition, but what 
I would prefer to label clinical judgment. Here we find the clinician 
as judge faced with stimuli that are not clearly defined (What is 
schizophrenia and how much is a lot of it?), that cannot easily be 
controlled and reproduced, and hence raise questions as to whether 
the communalities exist between trials that permit us to assume 
the repetitiveness necessary for sequential data and statistical pre- 
diction, 

The judgmental situation is difficult to control and extraneous 
variables intrude as the clinician makes his judgment, at one time 
of a patient in an open social situation in a hospital ward, at an- 
other time in the relatively restricted environment of an examining 
room or office. Nor are the categories of report clear and specific. 
They may be in the vague nosological terms of one of our current 
diagnostic systems, or they may be couched in such general terms 
as “suicidal risk” or “assaultive.” They may even be such general 
statements as “severe anxiety springing from oedipal problems” or 
such specific ones as “this patient should not be allowed to view the 
film, ‘The Three Faces of Eve,’ at the present stage in her treat- 
ment.” Yet communalities often may be teased out, and highly 
complicated stimulus situations may yield sufficiently reliable data 
for usable predictions, even with the relatively unsatisfactory cate- 
gories of classical Kraepelinian nomenclature, and a relatively un- 
developed (compared to the physical sciences) descriptive 
psychiatry to aid us in specifying and clarifying the symptomatolog- 
ical behaviors which may serve as stimuli. . 

At every point where we even approach the exactitude of speci- 
ficity and control of stimulus, judgmental setting, and categories 
of report which are typical of experimental psychophysics, com- 
munalities appear in our clinical data, and the uniformity and 
Sequential repetitiveness necessary for statistical prediction begins 
to show itself. There is both logic and data, and I shall report some 
of this shortly, to support my position that the clinical judgment 
ìs qualitatively related to the psychophysical judgment, that the 
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differences are ones of quantity rather than kind, that they are 
contrasting in amount rather than conflicting in nature, and that 
the clinical judgment is a culturally and educationally handicapped 


country cousin of the psychophysical judgment, and not a different 
species of being. 


THE ARGUMENT REVIEWED 


The argument so far runs like this: the repetitive or sequential 
data necessary for establishing probability inferences can be ob- 
tained from the clinical situation, Careful examination will show 
the possibility of locating and controlling communalities in stim- 
ulus, setting, and report so that judgments may be repeated under 
like conditions and sequential data may be obtained. Thus a loose 
class of symptomatic behaviors in a patient observed by a clinician 
whose training has some uniformity with that of other clinicians 
and whose judgments are made in an at least partially controlled 
observational setting may produce reports in terms of some diagnos- 
tic category on the basis of which valid predictions can be made 
concerning the patient’s future behavior. These results can be 
duplicated on subsequent occasions with the same and even other 
clinicians. The less possible it is to duplicate the essential conditions 
(which is another way of saying the more unique each judgmental 
setting is) the less reliable will be our predictions, but clinical prac- 
tice contains many judgmental situations in which the common 


aspects far outweigh the unique ones and in which meaningful 
sequential data may be obtained. 


While agreeing that many 


clinical settings are marked by com- 
mon and duplicable element: 


S, one can still argue that genuinely 
it is impossible to repeat the situation 
and hence to establish actuarial weightings to guide our predictions. 
This is the idiographic position in the idiographic-nomothetic con- 

y utly maintained by many clinicians who would 
claim that the major characteristic of clinical practice is that cli- 

, mpletely unique patients with completely 
unique developmental histories in completely unique environments, 
y unique prediction which, by definition 
amenable to actuarial treatment. I doubt 
the frequency of such occurrences, but let us accept them as pos- 
sibilities, p 


With no communalities 


) apparently involved, such a unique sit- 
uation cannot be approached actuarially. Does this deprive us of 
all opportunity of deriving probability 


weightings to guide us in 
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making predictions? Before answering this question let me state 
that I think any act of human judgment is in part unique, even in 
psychophysics, although in this latter field we can reduce the 
uniqueness and diminish the error variance in our measures by 
careful cultivation and control of communalities of stimulus, task 
setting, and report. The uniqueness is then minimized and becomes 
Statistically unimportant, but it remains, as any one who has spent 
hours in making psychophysical observations and in processing 
psychophysical data can testify. But we have deliberately chosen 
an example which admits no communality of stimulus, task setting, 
Or report. Are we then deprived of any chance of deriving prob- 
ability weightings in this admittedly (but probably purely hypo- 
thetically) unique situation? By no means. Our solution is to trans- 
fer the actuarial locus to the clinician himself. 

If the occurrence of such unique situations is as common as the 
adherents of the idiographic approach would have us believe, the 
chances of any clinician meeting a single one is practically nil. 
He will meet many of them. Thus, while each prediction may be 
unique in itself and therefore inaccessible per se to probability 
estimates, the clinician himself furnishes a common repetitive ele- 
ment in the judgmental situation. We can evaluate the success 
of each individual prediction and arrive at an over-all actuarial 
expression of how often any clinician has been correct in a past 
Series of unique predictions, The probability weighting which re- 
sults can then be transferred to any future unique prediction for 
estimating its probable correctness. If clinician A has been right 
9 times out of 10 in 10 past unique predictions, we can then use 
this “9 out of 10” as a weight to infer the chances that his next 
individual prediction will be correct. We may then extend our ref- 
erence class from clinician A, to clinicians A-Z, and even to “all” 
clinicians, or narrow it to clinician A in one specially defined situ- 
ation, etc., thus achieving probability weightings of increasing 
applicability. And as we derive such probability weightings, we 
shall undoubtedly begin to discover previously unrecognized com- 
munalities existing elsewhere in the situation. The actuarial locus 
may then be shifted to these, once they have been recognized. New 
diagnostic categories, further understanding of the variable involved 
in the process of judgment itself, and the development of suitable 
Categories of report will all add predictive power in time. Mean- 
while, transfering the actuarial locus to the clinician removes a 
Semantic stumbling block that does much to impede clinical prog- 
Tess today. 
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THE CLINICIAN AS AN INSTRUMENT 


In logical terms what we have done is to set the clinician upas 
a reference class from which we can derive the sequential obser- 


this probability interpretation to the single case, we are using a 
justified by expediency 
for the purpose of action, As Reichenbach says, “The frequency 
interpretation provides merely a substitute for the probability of 
a single case; the choice of the substitute depends on our state of 
o look for the narrowest reference 
are available. But these qualifi- 
cations do not represent any serious obstacles to the frequency 
i ; they merely portray the actual procedure used in 
all applications of Statistics to individual cases” (27). 

I was groping toward such a solution in 1946 when I said, “We 
should consider the individual clinician as a clinical instrument, 


and study and evaluate his performance exactly as we study and 
evaluate a test” (11). In 1951 I dealt with the 


ry can handle” (9). In 1955 some 
thes in interpreting the rationale of psy- 
chiatric selection (8). They have been treated at such length here, 
the developmental background for the ex- 
perimental material to be presented later, but because of the vital 
importance of th ion i gical and practical develop- 
ment of clinical ps ccepts a sharp, qualitative 
distinction between the idiographic and the homothetic on the 


ne carefree practice of a happy 
A pa À es no evaluative fore- or after-thought 
since it lies outside the realm of empirical validation. 
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Ipiocrapuic Versus NOMOTHETIC 


The most lucid and inclusive discussion of this idiographic-nomo- 
thetic controversy to date appears in Paul E. Meehl’s classic mono- 
graph of 1954, Clinical versus Statistical Prediction (26). As both 
an accomplished statistician and accomplished clinician Meehl 
Writes with an intensity of feeling that testifies that his book is no 
dry scholarly exercise but the working through of a vital personal 
Issue. He concludes soundly, roundly, and with convincing logic 
that clinical prediction should be based upon actuarial procedures. 
„rom his logic there would seem to me to be no escape, and it is 
indeed the position taken in this paper. The only alternative would 
be a flight into mysticism which would be disastrous for clinical 
Psychology as either a “behavioral science” or a “healing art.” 

With his logic then, I can find no argument. With some of the 
content which he builds into his logical structure, I would demur. 

€ could have made his case equally well without buttressing it 

yY what seems at times an unjustified demeaning of the actuarial 
potentialities of clinical judgment itself. By opposing clinical judg- 
Ment (intuition if you will) to such well developed actuarial tech- 
ques as the MMPI, he at times comes dangerously close to deny- 
Ing any actuarial potentiality for the judgmental approach. As my 
colleague Dr. Roy Hamlin, a persistent researcher in this field of 
Clinical judgment (and a thoroughly objective one), so pithily re- 
marked during an APA round table discussion, “Mech! not only 
makes a straw man of the clinician but takes his pants away as well.” 

It is unfortunate that Mech] has chosen to buttress his case for 
actuarial procedures by using comparative data between “test” 
and “clinical” procedures in the selection field. The evaluative 
criteria in selection are very complex and the simple probability 
statistic which he applies is not suitable to this complexity. As 
Cronbach and Gleser (5) have recently pointed out in their “Psy- 
chological Tests and Personnel Decisions,” the statistics of games 
or decision theory are more suitable for sophisticated evaluation in 
this field. I have no doubt that actuarial test procedures, where 
adequate and available ultimately would still prove more efficient 
than clinical ones but the race might well be a closer one were a 

ifferent evaluative statistic to be applied. To illustrate the prob- 
em, we have some data never published in complete form and 
admittedly involving some inference and extrapolation that suggest 
Nat in one situation both methods ran neck-and-neck in the per- 
Centage of failures identified, but differed widely in the false posi- 
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tive rate (31). There is no simple technique for en u 
and “false positives” in evaluation without the iu a ar .. 
ceedingly complicated decision statistics, which unfortuna ely 
only slowly being understood and adapted at the time. tale 
Moreover, by limiting himself to studies in which both meth ” 
are used, while tightening the logical force of his argument, Mee ‘| 
inevitably neglects any instance in which clinical prediction is En 
independently with some success. I can think of two such studies 
from our own work (30, 21). Let me state again, my faith in the 
actuarial procedure is unshaken. I merely feel that a more sympa- 
thetic treatment of clinical procedures might have uncovered = 
dence that they are more efficient than Meehl would lead his read- 
ers to believe. 
Finally, there is what seems to me to be a serious flaw in the 


design of Meehl's comparative study. In most cases he rn 
selected tests against unselected clinicians. He compares carefully 
developed actuarial procedures, 


into whose standardization and 
validation for the specific situatio 


mendous effort has gone, with clinicians 


energy, and financing th 
MMPI, had b 


o the development of the 


of a group of clinical psy- 


proposes, 

I doubt if Meeh! would raise any serious objection to what I have 
said. As he himself remarks, “I would defend simultaneously (and, 
y) the two Propositions that (1) there are some 
na which cannot be best studied in the labora- 


I hope, consistent] 
behavior phenome 
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tory, at least with any confidence in one’s extrapolations, and (2) 
until some quantification, at least frequency counts and contingency 
measures, is applied to clinical evidence, we can have very little 
confidence in our claims.” With this I am in total agreement. Per- 
haps our difference, if there is one, could be summarized by saying 
that I feel that Meehl views the exercise of clinical judgment as a 
necessary evil, whereas I view it as a fascinating phenomenon with 


a genuine predictive potential. 


PRESENT POSITION AND PURPOSE 


Were I to state my own position, it would be this: 

1. There are some behavioral phenomena which cannot best be 
studied under the controlled laboratory conditions necessary for 
the development of sophisticated actuarial techniques such as ob- 
jective test devices, and consequently such sophisticated actuarial 
devices are not currently available for use in studying such phenom- 
ena, 

2. Clinical judgment furnishes a technique for the study of many 
of these behaviors that is necessary, suitable, and promising. 

8. Such clinical judgment must and can be subjected to some 
et evaluation and developed and improved along actuarial 
ines, 

4. The use of clinical judgment is a necessary and inevitable 
preliminary step in our technical evolution toward an actuarial goal 
(10). Again, I suspect that Paul Meehl would be sympathetic with 
this formulation. 

Before presenting data which we currently are obtaining from 
our actuarial approach to clinical judgment, let me state that the 
Motivation for our program stems not from any a priori logical 
‘formulation, but from the hard reality of certain experimental find- 
ings obtained in our earlier investigation of the efficacy of psychi- 
atric selection in the U. S. Navy in World War II. This program 
and its rationale has been presented elsewhere (8), but I should 
like here to review briefly several of the studies from which our 


Interest in clinical judgment arose. 


EARLY STUDIES 


i A large scale validational study of the efficacy of the Navy’s neu- 
ropsychiatric screening program furnished evidence that it was 
succesful in reducing subsequent psychiatric attrition during serv- 
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ice (19, 18). Since clinical judgment in the form of individual po 
ical diagnoses and predictions from interview impressions, ei = 
formed an important part of the screening technique, it oe = 
possible to conceive of how the general program could be va r it 
such an integral part of it were invalid. In two subsequent 7 ies 
we investigated clinical prediction specifically. In the first of these 
a clinician graded a group of 944 seamen suspected of a 
ment. A rough quantitative scale based on the categories mild, 
moderate, and severe was used, and subsequent attrition in each 
category confirmed the quantitative judgment of the clinician (30). 
This was a pre-planned predictive experiment, and not set up as a 
post hoc study. The clinician was selected on the grounds of his 


general professional competence it is true, but we were not inter- 
ested in what anyone, irrespective of 


tests used in selection experiments, 


Encouraged by these results, we then made a study of the pre- 
dictive value of certain diagnostic categories such as neurosis, 


schizoid personality, asocial psychopathy, etc., relating the original 
diagnosis to subsequent behavior durin 


isciplinary problem than a normal control group, 


but to have a Preponderant incidence of alcoholic behavior when 
they did get into trouble. Schizoid personalities showed no inci- 
dence of alcoholic diff culty but 


i were a leave and insubordination 
problem. The Psychop se, outstanding as a source 
ty, but were particularly noticeable for in- 


common diagnostic categories. The 


promises an objective observational basis on which valid clinical 
judgments may be based, 


FoLLow-up AND OBJECTIVES 


Two further studies supported our belief that the clinical judg- 
ment that we had been working with was a valid, lawful phenom- 
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enon. We reasoned that the diagnostic judgment should be easier 
the more maladjusted the interviewees were; and that, since the 
basis for the judgments must rest on observable behavior, the more 
maladjusted the interviewed group was the more symptomatic be- 
haviors the clinician would note during his interview. Both of these 
hypotheses were confirmed (28, 22). By now we were convinced 
that clinical judgments could be valid and reliable, that clinical 
judgment itself was a lawful orderly process open to experimental 
analysis, and that its study and development offered the promise 
of supplying a useful clinical technique in those situations where 
more formal actuarial techniques, such as objective tests, were not 
available. To be useful, however, we were convinced that it must 
be approached and developed as an actuarial technique. As a con- 
sequence, we began the experimental program I shall now discuss. 

To assist in clarifying the objectives of our program, let me say 
that we envisage it as a combination of basic and applied research, 
basic in that we wish to understand the nature of clinical judgment, 
applied in that we hope our understanding will further its useful 
potential as a tool in clinical practice. In any research sufficiently 
well planned and executed to merit the name it is difficult to sep- 
arate “pure” and “applied” aspects. I cannot imagine any basic 
research which does not have implications for the control and 
Manipulation of man and his environment. A narrow involvement 
In practicing technique for technique’s sake, a type of schizoid, nar- 
Cissistic laboratory play activity, may and does result in the kind of 
Counting-the-pickets-in-a-picket-fence experimentation which may 
be temporarily accepted, but whose absurdity sooner or later is 
recognized. Nor can I imagine any applied problem, carefully an- 
alyzed and thoughtfully approached, the practical answer to which 
does not add something to our knowledge of the fundamental phe- 
nomenon involved. There may be experimenters in either situation 
who refuse to look beyond the immediate significance of their data 
but this is a human limitation and not one of research methodology. 

This blending of basic and applied interest, this cross fertilization 
and mutual facilitation between approaches, has nowhere been 
better illustrated than in the flourishing post-war program of mili- 
tary research, where the attempt to answer practical problems has 
necessitated the furtherance of fundamental knowledge, and where 
in turn the fundamental knowledge obtained leads to new practical 
applications. Again, one senses a quantitative rather than qualita- 
tive difference, and I have been casting about for some parameters 
on which this could be expressed if not measured. Two seem to 
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me to be of use in expressing the quantitative relation between 
the basic and applied approaches. These are the generalization 
potential of any group of scientific data, and their resistance to 
scientific obsolescence. 

The more widely we can generalize from our specific laboratory 
situation to other diverse fields and the longer our findings remain 
useful without fundamental revision (resistance to obsolescence), 


> 


can then be applied widely and over a long period of time to many 
gnition and decision making; and spend- 
n a specific recognition problem involving 


h 


In our work we 
findings that would h 


e. The judgmental situation is realistic, 
$ bservers are representa- 
tegories of response are understandable, 


A with space provided for recording 
Ha essence involves the clinician sitting 4 
est responses in a reasonable approxima 
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tion to normal working conditions. This also frees us from the neces- 
sity of calling each clinician into a fixed laboratory situation, thus 
increasing the number of clinicians available and the geographical 
range from which they may be drawn. We concentrated on re- 
Sponses to vocabulary items since vocabulary is a brief, easily ad- 
ministered, popular type of test of wide usefulness which has stood 
up well through the years and promises to be valuable for some 
years to come. It may be scored objectively for purposes of com- 
parison, and offers a relatively fertile sample of the thinking proc- 
esses of the subject. 
: The judges were all well trained clinicians. The original min- 
imal criteria for selection were a Ph.D. and four years of full time 
professional experience. Most of them were well beyond this min- 
mum. While our 48 judges must remain anonymous, they are recog- 
nized, well established professional people, drawn from all over 
the country, and include many of the leaders in the profession 
today. Our naive subjects were all drawn from undergraduate stu- 
dents in psychology at Northwestern University. We must remem- 
er, however, that their naivete was relative. They are certainly 
More intelligent and more sophisticated psychologically than the 
average man. 
The judgments were in terms of a 7 or 9 point quantitative scale, 
Tunning from the minimum to the maximum of the phenomenon 
eing evaluated, Such scales are understandable, easy to handle, 
and lend themselves to mathematical treatment. We must remem- 
er, however, that they are rough ordinal scales and as yet we have 
made no attempt to meet such problems as equal-interval steps, 
etc. Standardized instructions were used with every attempt being 
ee to render them clear and unambiguous. We will set forth 
elow the results of our studies in rough chronological order, and 
then discuss their implications. 


First RESULTS 


As sometimes happens, our first approach to the problem was 
ay ambitious and somewhat disastrous. Reasoning that the 
ir es of stimulus context which produce judgmental distortions 

2 classical psychophysics should also appear in clinical judgment, 
a attempted to find anchoring effects in a situation involving 
ur. of the amount of schizophrenic confusion exhibited in pa- 
rad responses to vocabulary test items (1). Professional clinicians, 
Sraduate students in clinical psychology, and naive undergradu- 
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CTIVE APPROACHE: 
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mons- 
used as subjects. Anchoring effects nr a 
ra the reliability of the clinicians’ ne te a 

: iabili asu 

nantes ee ee en nveniely related to level of 
fetta "This last was particularly upsetting RR. 
Enke interviews with some of our subjects en. 
fespsition of our data indicated that two opposite contex 


or- 
o general and allow too much OPPA 
i i by our trained clinicians. 


F me 
ur frustration and partly to illustrate ee 
t in such work, we wrote a brief, hum 


he 
» Progressively analytic approach Pe 
step we deemed necessary was to answ could 
asic question of the reliability of clinical judgment. If we 
establish this, we could th 


nore 
en proceed to investigate some of the n 
complex phenomena of judgment. 


FURTHER Work 


Phrenic patie iced, 
fessional clinicians in the Chicago area, A 7-point scale was na 
ote improved instructions designed to eliminate ie Fb 
vious ambiguities, Judgment was on the basis of “how schizophr 56- 
each of these responses is? We defined reliability both o cy 
ment of each judge with the group, and as individual consiste: 
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upon retesting. The original ratings were repeated after intervals 
of 8 and 18 months. 

Agreement of the judges with the group on vocabulary responses 
showed r’s ranging from .78 to ‚92 for the first rating, .69 to .95 on 
the second, and .81 to .92 on the third. For comprehension items 
these were .64 to .88, .66 to .86, and .71 to .89 respectively. Test- 
retest r’s between the first and second ratings for each clinician 
ran from .65 to .91 for vocabulary, and .68 to .90 for comprehen- 
sion. Since all the clinicians were from the Chicago area we ran 
a cross validational group of 16 other clinicians from all areas of 
the United States, A single rating only was obtained from this group. 
The judge-group r’s ran from .59 to .92 for vocabulary and .63 to 
‚90 for comprehension. Correlations between the mean values as- 
signed each stimulus by the first group (original rating) and the 
new group were .93 for vocabulary and .96 for comprehension. 
These reliabilities are quite high and led us to believe not only 
that such ratings are reliable for clinical use, but that they can 
be used for experimental purposes in the further investigation of 
the nature of judgment. 

_Armhoff’s finding that reliability, as defined by the standard de- 
viation of the mean stimulus values, was in inverse relation to 
amount of professional training still bothered us. Consequently 
with the aid of Nelson Jones and Mrs. Hunt, I made several studies 
of the judgments given by naive undergraduate students using our 
new and improved instructions (17). While the mean value was 
hot significantly different from that of the trained clinicians, the 
standard deviation of the mean was significantly greater indicating 
that contrary to Arnhofl’s original results clinical experience is 
acting to increase the reliability of the professional clinicians judg- 
ments. The correlation between the mean stimulus values for the 
Stoups was .88, however, indicating a high degree of agreement 

etween their ratings. 


INCREASING THE Gap BETWEEN CLINICIANS AND Narve RATERS 


Encouraged by the high reliabilities obtained thus far, J ones and 
I (15) extended our investigation beyond the simple dimension of 
schizophrenic” to include more subtle dimensions or specific as 
Pects of schizophrenic thinking. Our prediction was that the ratings 
of trained clinicians would diminish in reliability but still be within 
acceptable limits, while the gap in performance between traine 
and naive judges would increase as the ratings became more p 
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cult. The stimuli were the 50 vocabulary Tesponses ros 
used, The new dimensions were potential intelligence ( ei nn 
not in terms of the correctness of the response, but in nn (a 
potential intelligence level indicated by it), communica a n 
dimension of private-public meaning), and concreteness (the cla : 
sical concrete-abstract aspect of schizophrenic thinking). The ex 
perienced clinicians were 31 of our previous subjects all of whom 
had rated the words earlier for “schizophrenic.” All 31 made i 
ings of “potential intelligence,” 15 of “communicability, mi Es 
of “concrete-abstract.” The naive subjects were a new group o 

undergraduates, 30 of whom judged on the schizophrenic dimen- 


sion, 30 on intelligence, and 15 each on communicability and con- 
crete-abstract, 


Reliability was defined as ea 
For the tra 


intelligence 14 to .83, median .61; for com- 
municability ‚39 to ‘85, median .75 

gative values were contributed by only three 
le not quite as reliable, the undergraduates ran 


e clinicians, The r’s between the 
d each stimulus by 


E » „Potential intelligence,” an 
Concrete-abstract” through rank order correla- 
mean value of the stimuli for the four types 0 
*perienced clinicians the dimension of schizo- 


an eBatively correlated with communicability; 
not related to intelligence, and mini 
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bility it seems obvious that our clinicians are differentiating between 
the dimensions. 

_ The picture of schizophrenic thinking presented by these findings 
is in accord with conventional thinking except for the low elation 
between schizophrenia and the tendency to concreteness. Concrete 
thinking classically has been attributed to the schizophrenic. Our 
results, however, are in accord with those of McGaughran and 
Moran (25) obtained in a sorting experiment where they found a 
tendency to personal, noncommunicable thinking more typical of 
os schizophrenic than a tendency either to concreteness or abstrac- 

n. 

‚For the naive undergraduates, 
different. Everything is highly related, either positively or nega- 
tively, to everything else. The evidence would indicate that they 
are not distinguishing between the scales, and possibly are falling 
back upon some common denominator as a basis for all their judg- 
ments. In any case, the findings confirm the variability picture, and 
it seems evident that while our naive subjects perform rather well 
when rating for broad, general aspects of disorder, they fall down 
when given more subtle dimensions to use. It is at this point that 
the superiority of our trained clinicians becomes apparent. 


however, the results are quite 


ADDITIONAL CATEGORIZATION 


es and I (16) have introduced a 


In our latest investigation, Jon 
1 a different dimension of judg- 


different category of disorder witl 
ment and a return to the use of responses on comprehension test 
i a ‚Using 50 such items drawn from Wechsler-Bellevue materials 
administered to a group of Naval disciplinary cases, 15 of our pool 
of clinicians were asked to rate on the basis of the asocial tendency 
revealed in the responses. The responses were selected to represent 
a wide range of such tendency. Ratings also were obtained from 
an undergraduate group. 

The findings show the same high reli 
phrenic materials. For the clinicians the 1's ranged from a high 
of .92 to a low of .64 with a median of .82. A random splitting of 
the group gave an r of .94. For the undergraduates the range was 
from .88 to .51 with a median of .72, again quite high but lower than 
the clinicians. The split-group r was .93. The agreement between 
clinicians and undergraduates, based upon the mean values assigned 
the stimuli, was .91. The difference between the mean values of 
all the stimuli for the two groups was not significant; nor, contrary 


ability obtained with schizo- 
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to our previous findings, was the difference in standard deviations 
ignificant. ; . 
eat that high reliability or good inter-judge ge eo 
tinues to appear as we extend our reference groups by ad one je 
dimension and a new type of disorder indicates some possibility 


differences do appear 
seem to be limited to specific situations. We find no evidence o 


Practica, APPLICATIONS 
Having described our mot 


basic and applied interests, it i 


method of pedagogical 

podge in which Ei illustrative materia] is usually presented. We 

have published such materi i 

hibited in both vocabul 
ture program envisages further r 


ve no experimental data on their ae 
gogical efficacy, but rough observation of their use indicates tha 


test behavior which Previously was only open to broad, qualitative 
interpretation is obvious. Thr 


scaling techniques, 
The Practicing clinician will think of many possible specific adapta- 
tions. One which p the near future is the use of 
judgments of potential intelligence to provide an internal scatter 
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measure with any vocabulary test. The responses on such a test 
could be scored in two ways: first, with the usual objective stencil 
or rules for “correct” and “incorrect” to give a measure of functional 
intelligence; second, with scaled values on the dimension of poten- 
tial intelligence as we have used it to give a measure of the sub- 
ject’s previous (or potential, if the disorder is thought of as reversible) 
level of ability. The discrepancy between these two measures 
should provide some estimate of the current intellectual deficit. 
One of the most intriguing possibilities stems from the ability of 
our naive judges to perform in a fashion closely comparable to our 
trained clinicians provided we use broad, general dimensions. This 
confirms what common sense and experience has already demon- 
strated, that even the man in the street can recognize deviant be- 
havior (and this without the advantages of training on our scalesl). 
If this recognition can be turned into reliable scale judgments, the 
use of such scales by nursing aids and ancillary ward personnel has 
possibilities. Such scales might be helpful in the courts, in social 
work situations, and certainly in the military services. It might even 
e possible to turn English professors into diagnosticians, as well 
as scholars! 


PRESENT CONCERNS 


As we speak of the future, we reveal one of the great deficiencies 
of our work at present. We have dealt so far with reliability and 

ave not touched diagnostic validity. Some, I think, can be inferred 
‘Tom our reliabilities, but irrespective of current squabbles concern- 
ing the identity or difference of the meanings of the terms reliability 
and validity, validity does point to a realm of practical diagnostic 
applicability with which we have not dealt. Particularly pertinent 
is the fact that the stimuli used to date have been carefully pre- 
selected to offer a representative range of patient responses. What 
will happen to the reliability of judgment when clinicians exercise it 
on unselected test responses? Will actual, run-of-the-mill tests in 
themselves offer sufficiently rich materials for such scaling tech- 
niques? Our techniques will be improved as our understanding of 
clinical judgment improves. 

In closing, I should like to deal briefly with those studies in which 
we have explored some of the factors causing distortions of judg- 
ment in the hope that through their understanding and control we 
may improve clinical performance. That there are phenomena com- 
mon to all acts of judgment has long been a cherished belief of 
Mine (7). The specific area we have chosen to investigate is the 
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effect of the context upon the judged value of the stimulus. ug 
hoff’s original study (1) was of anchoring effects, which are pen he 
the general context. His failure to find these effects did not =; 
courage us, and we resolved to return to the problem. Fortunate y 
at this time the interest of my colleague, Professor Donald T. emk 
bell, was aroused and his collaboration secured. He suggested a 
more sophisticated design that seemed better suited to our purposes 
than the one we had been using. With the assistance of Mrs. Nan 
A. Lewis, we set to work, 

The stimuli were our schizophrenic verbal responses which were 


rated for the amount of schizophrenic thinking involved, using a 


9-point scale. Northwestern undergraduates provided our subjects. 
In essence, our exp 


erimental situation consisted of the repeated 
presentation of stimuli of median value against a limited context 
of stimuli from either the upper or lower halves of the scale, a 
transition to the opposite context followed. Context effect w 
measured by changes in the judged value of the median stimull. 
Our hypothesis was confirmed; median stimuli were judged higher 
when presented in a context of low stimuli and lower when presente: 

in a high stimuli context (2). For reasons not pertinent here, in- 
volving some general problems of psychophysical technique, we 
repeated the problem with tones as stimuli, and confirmed our 


previous findings (4), We have demonstrated that these effects can 
be used as a crite 


tion for the evaluation of different types of scales 
(3). 
It was still necessary to bridge the gap between naive under- 
graduates and trained clinicians, 


peated stimulation of the study 
it is understandable that the etec 
bell’s students, Dr. Marshall Segall, 


uch context effects can be demon- 
ocial attitudes (28). 


some practical adaptation, but that furt 
judgmental situation will lead to greater 
and hence to more accurate 
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of training the judge has recently been recognized in psychophysics 
itself. In an article reflecting much of our own orientation in the 
clinical field, Engen and Tulunay say, “The use of unpracticed and 
naive Os as instruments of precision, €.8., in measuring sensory 
magnitudes on a ratio scale, may not be unlike the use of un- 
calibrated physical instruments. With such instruments, constant 
errors, such as those associated with context, may remain unknown. 
It may, however, be feasible to “calibrate” the human O by giving 
him experience with the various types of psychophysical judgments 
and their sources of bias” (6). If such an august discipline as psy- 
chophysics can benefit from such a program of improvement, may 
not the lowly field of clinical psychology aspire to the same goal? At 
the very least we are in good company! 
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Increasing Clinical Efficiency 
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simple opinion from experience, before he can contribute in the 
team development of formulation. 

The Rorschach and other projective devices have probably gained 
a great deal of their popularity because of their adaptability to the 
satisfaction of this need. In many cases, a candid psychologist will 
sum the whole of his value argument for a test by saying that he 
finds it to be prolific in source material for psychodynamic state- 
ments (psychological causes) about patients. These statements are 
not only valued in the clinical setting in which many psychologists 
work, but in a majority of cases there is some real criticism by 
clinical peers if test data are not presented as background for dy- 
namic evaluations. Although these statements rest in some way 
upon data in the test, they are not usually evoked by objective 
aspects of the test as the technical requirement of a test is usually 
stated, and they mostly depend upon professional hearsay or at most 
upon weak data from validity study. We are often guilty of accept- 
Ing some halo effect weight to our subjectively derived observations 
of a patient because the basic observations were made in the course 
of administration of an objective test. That is, our statements gain 
in staff creditability and in our own belief in them to the extent 
that they are associated with other informational items properly 
termed objective and validated test data. 

An allied utility from using tests is the relief of the clinician from 
the responsibility of making a judgment that cannot be based upon 
any identifiable procedure or objective datum. The test, even if it 
were completely invalid for the situation, nevertheless constitutes a 
decision-making device. The need in the psychologist is comparable 
to the reason we toss a coin. We try to decide certain ambivalent 
situations by shifting the responsibility to an outside event. A test 
indication may have more relationship to the decision than a coin 
would, but often the real value lies in the provision of a sign used 
to dissolve indecision. Complicated test utilities such as these are 
often respectable and real, since they are tools for use in clinical 
areas that cannot be better handled. Of course, we will give up 
Such props as we develop more appropriate and valid tests. 

At the outset it is important to differentiate between the function 
of a test or test situation when it is being used in accordance with 
our more formal understanding of the meaning of the word “test 
and the quite proper observations that go along with the administra- 
tion of a test and which are, perhaps, better thought of as resulting 
from a controlled interview situation. The Rorschach provides a 
good illustration of this contrast. It is a test when it yields scores 
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and profiles, but it is merely a partially controlled interview when 
the clinician uses unscored and unstandardized data to describe 


the patient’s personality, no matter how much authority and ex- 
perience lie behind the interpretations. 


CLINICAL Versus PSYCHOMETRIC DATA 


The confusion of psychometric methods with interviews and with 
other non-metric methods of gaining information may, in part, have 
started from an over-reaction that is historically characteristic of 
psychology. The movement toward acceptance of the use of psy- 
chometric data in practical clinical application occurred against 
a strong tradition in the use of interviews and oral examinations. 
Psychiatry was deeply involved with these subjective methods and, 
in general, psychologists found it hard to compete. It was not very 
effective to pit a test score against the word of the experienced 
clinical man. The struggle for recognition of the value of psy- 
chometrics may have caused over-reaction against subjective 
clinical evaluations to the point where modern psychologists have 
neglected the potentialities of the interview and, particularly, the 
possibility of developing more objective information from the in- 


terview by using methods that control the situation. 
If the psychometric as 
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interview. It is hard to teach this skill, and encouragement of the 
practice often leads to corruption of the objective test evidence. 
In summary, it is probable that we are beginning a period of rapid 
development in efficiency by the development of “cook books” with 
the scheme exemplified by Meehl (9) and developed by Drake (2). 
The distinctive character of this movement is the development of 
clinical generalizations about personality from objective test data 
that are in some way synthesized. This synthesis may be effected 
by use of a code (3) or some other clerical system to skim the va- 
lidity cream from a profile of objective personality scale scores. 


Wuo SHALL ADMINISTER THE Tests? 


Tests for clinical use can be divided into those that must be 
administered by a clinical psychologist and those that can be ad- 
ministered by a psychometrist. The psychometrist can have one or 
two years of graduate training or be an in-service trained person 
chosen for the ability to get cooperation and observe general rules. 
Examples of tests that use professional time are the WAIS, the 
Rorschach, the TAT, and a long list of others including many of the 
tests for mental deficit. Some of these tests require nearly half a day 
for administration, scoring, and evaluation, Examples of tests that 
can be administered by a psychometrist are the Shipley, the MMPI, 
Sentence Completion, and the Porteus. All of these require skilled 
interpretation, but the completed profile or answer records are the 
starting point. The use of these latter tests is much more efficient, 
at least in terms of professional time investment. 

With the shortage of clinical psychologists and with the general 
resistance of the fully accredited psychologist to doing routine test- 
ing, the tests that require skilled professional time in administration 
or scoring will need to provide much more information or other 
value than will the second type of test before their use is justifiable. 
Progress toward efficiency should increasingly recognize this fact 
as the clinical psychologists feel more responsibility for efficiency 
in their routines. At present, one rarely sees any mention of the 
cost in professional time as part of the validity or other value dis- 
cussion of new or old tests. Some psychologists even seem to feel 
that the processes of testing should be kept complicated and that 
the psychologist should base his professional standing upon them. 
; As is true of most of the other significant factors in the examina- 
tion of practical test efficiency, it is not easy to place comparative 


values upon the differences between tests in professional time for 
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administration and scoring. There is a danger that we will go too 
far, not use individual tests when they should be used, and, even 
more likely, not give enough time to an interview with the patient 
when we are provided with the test data by someone else. As a 
policy, for example, routine MMPI’s can be handled in large num- 
bers by one experienced reader who makes a few screening judg- 
ments on each profile put before him. But for diagnostic and other 
significant decisions, an interview with the patient is imperative. 
For efficiency, such interviews by the clinician should be approached 
with a maximum of completed test profiles and other test informa- 
tion in hand. They become directed and efficient checks upon 
the peculiar significance of the test indications as applied to the 
patient and upon the probable validity of the test data. 

It is apparent by now that a great gain in clinical efficiency can 
that psychology must accept leader- 
psychometricians. It is inexcusable 
gists at the Diplomate level and for 
time scoring or administering tests 
uld be obtained by a psycho- 
ic or clerical help is available, 
inical psychologist not to use 
nical psychologist to bring his 
Cceptance of this position would 
ologists and some large clinical 
ologists are exclusively used with 
or clerical workers. 


Test EFFICIENCY AND Test Sratistics 


Test efficiency has 
Suggest practical 


INCREASING CLINICAL EFFICIENCY 197 


test hits and misses in proper cross validation, is a well recognized 
procedure. Unfortunately, as Meehl and Rosen (8) repeatedly state, 
this procedure is rarely presented properly, and the data are not 
available for it in the case of most of the personality tests. Estima- 
tions of the base rate for ordinary applications together with con- 
sideration of the data for advantageous cutting points are even 
rarer in our literature. The clinician usually works with very little 
of this information even if he is able to state his local problems 
properly. 

On the whole, as Mechl and Rosen show, application of the 
efficiency formulas where we can estimate the necessary parameters 
tends toward discouraging conclusions. As methods for desirably 
changing the predictive probability of statements about patients and 
their problems, many of the tests we use are impractically weak; 
some actually lower the predictive accuracy because of their im- 
Proper cutting scores. We should be much freer to not make state- 
ments from test data. Effectively this means that we can administer 
a test and refuse to use a proportion of the scores because they fall 
in indeterminate areas. Such a practice will permit strong state- 
ments about a few of the persons tested even when a weak test is 
used. A testing program could sometimes be justified when only 
One among a hundred scores can be used rationally. 

When efficiency of a test is based upon the data in the four-fold 
table, or overlapping frequency distributions, or upon some relation- 
ship of these values to the base rate, one must decide what effi- 
Clency weight should be given to every one of the types of test 
hits, errors, and indeterminacies. The problem is rather simple if 
only one category is considered significant. For example, one might 
select true positives as the exclusive measure of efficiency. With a 
large and stable supply population, that test giving the largest per 
cent in the true positive cell would be the most efficient. Although 
Some situations approach this simple case, there are probably no 
Completely consistent examples. Some other category such as the 
false positive cases will always influence the efficiency evaluation 
in the practical situation. Most often a combination of several test 
validity categories must be considered. Among these, the per cent 
of false negatives is especially likely to be critical. For example, we 
are very unhappy if we fail to identify one patient with a brain 
tumor because the appropriate test indicator was negative for tumor. 
We can, however, accept a few more false positive indications. By 
contrast, if a test is intended to predict which boys will become 
severely delinquent, the false positive cases can be very embar- 
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rassing. People are quite tolerant of favorable predictions (false 
negatives) even if the prediction is wrong. Efficient evaluations 
closely depend upon the special demands put upon the test by 
the applied situation and are difficult to generalize. 

Fortunately, the samples on which we usually work are enriched 
samples; they have a larger base rate than does the general popula- 
tion. Using a test of schizophrenia for hospital patients, we know 
that the rate of the criterion condition will be higher than among 
non-patients. Real suicidal intent is rare in general populations, but 
among hospital patients who look depressed, it is many times more 
common (10). Our tests are fortunately used so that they operate 
as successive hurdles in probability, This can help the efficiency by 
pushing the base rate of the critical event upwards. Arguments 
such as these mitigate the dismal picture of clinical test efficiency 
that is suggested by candid consideration of the statistical facts 
available tends, 

In summary, statistically evaluated test efficiencies can be applied 
when proper parameters are available or can be estimated. These 
ed for the special test application 
the importance of each of the various 
ould be more hard headed about the 
ms and tables, since it can often be 
treating the data would justify routine 
d in other cases it would become ap- 
nly under restricted conditions. 
ncy in the use of clinical resources is 
e psychologist should be foremost among those 
. There can be no doubt that a great deal of 

n tests, and much of the routine application 

if we choose to be critical, This is true 
ae are selected or when other 
ie i „rable tor increasing efficiency. Optima 
en of ore points or identification of the hapet test 
usually be achieved in ordinary clinical practice. 


Wuy Dracnosr? 
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devices, One must admit that the diagnoses for functional mental 
disorders do not have the desirable properties of good medical 
diagnoses. 

The most popular test we use today, as well as our other diag- 
nostic procedures, are devoted to the production of diagnoses that 
will approximately fit the professional culture. We use the tests 
that will provide the proper statistics to indicate the rates of the 
traditional mental illnesses, that show the types of patients handled, 
or that provide a basis for disability or legal status. There are rarely 
any crucial elements except these cultural ones involved in deciding 
whether a patient is schizophrenic or a severe neurotic. 

In a practical application we do not always want to predict who 
would be diagnosed schizophrenic by a clinic. We really want to 
know which persons will become, or presently are, ill with the so 
far incompletely defined illness. If there is a morbid unity tending 
to evoke the diagnosis of schizophrenia, it is obvious that persons 
with this disorder are often given some other diagnosis or are called 
normal. This is apparent from the fact that clinicians and clinics 
do not show high reliability (7). What we may want to identify 
with a test is the true illness, schizophrenia, and the clinic’s diag- 
nostic system can be thought of as another test with less than perfect 
efficiency, Either the clinical diagnosis or the test could be the 
better indicator. 

For a sophisticated appraisal, we should check both the test and 
the clinic against multiple symptoms or other items that will permit 
a relative decision about the efficiencies. We have too much dis- 
regarded this aspect of testing efficiency by seeming to assume 
either that it is good enough for a test to be developed for agreement 
with clinical diagnoses (which means inevitably limiting the va- 
lidity), or that some arbitrary amount of disagreement with diag- 
Noses invalidates a test. It seems that, of these two unhappy 
alternatives, it is better to choose the former unless there is no 
value in making clinical diagnoses or prognoses. + 

It is interesting that no one has broken the Kraepelinian system. 

ood arguments against it are cliches taught to all clinical psy- 
chology students, Similarly, good constructive ideas have been 
developed and partly established in local practice. Apparently ie 
One has had prestige enough or, more likely, no one has rea y 
offered a convincingly better system. Even Adolph Meyer an 
Freud have changed no more than a few small points. The new 
classifications that have become official are never more than a 
change of words. Psychologists have tried to hold aloof and un- 
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sullied by the use of their tests for supporting a classification = 
is often exemplified by majority or prestige vote in a eo = 
ference; but as clinical psychology accepts more real an er 
sponsibility, this aloof position is untenable. ar 7 
practicing majority of clinical peers and the common pro = nn 
language are changing very little. It is in one way ng 5 : 
enough that psychologists have caused so little change. T z p r 
chologists suffer from a plethora of prophets with different thes 
and these have little in common except the call for a change 
Summarizing the foregoing argument, it could be that some a 
jective clinical tests have been unjustly accused of low validity an 
efficiency. If present-day clinical practice is largely based Hs 
arbitrary diagnoses and dynamic formulations rather than rn 
theoretically or scientifically demonstrated indications, test 7 3 
ciency may now be nearly as high as the predictable upper limit for 
such arbitrary symptom conglomerates. This thought is a sani 
cheerful one, and the clinical literature is providing some suppor 
for it. More and more useful validity items are appearing to sup- 
port an increasing faith in the objective approaches to personality. 


WHAT Is NEEDED 


What we need today is routine Practice and what we can now 
get from our testin and c 


linical studies is uniformity in the diag- 
nostic reactions of clinicians to the traditional selection of diagnostic 
symptoms. It is desirable to increase the agreement among clini- 
cians and clinics in 


applying the accepted diagnostic terms and in 
making other ordinary clinical decisions. The dia 


ery likely based u 


sequences or better ther- 
r to another. Here again, 
$ prestige are more involved than is any estab- 
lished fact about th gnosis or treatment. 


om the cultural uniformity 
of the professiona] clinici 
illness, it b 


efficiency by using 
niformity of clinica] decisions. A reduction 
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furiate some psychologists, but it provides for a real security and 
useful contribution to those who are clinically operating in the cul- 
ture. Starting with this point of view, test devices have high effi- 
ciency when they provide maximum uniformity and coherence in 
diagnostic and psychodynamic formulations together with the usual 
psychometric qualities of objectivity and reliability. In effect, we 
would gain in efficiency if we abandoned the attempt to validate 
personality tests by increasing the agreement with diagnoses or 
clinical estimations beyond a certain point. Advances in validity, 
once an approximation to criteria of clinical usage is reached, should 
come more in the form of construct development and in improving 
the purity of the generalizations from test data. 

In this way of thinking, the efficiency of certain tests will steadily 
rise as the clinical jargon—which develops from the objective data 
provided in the test ‘signs or scores—becomes more widely used 
and the universality of patient-descriptive language is advanced by 
this. This development could make it more possible for different 
clinics to select more nearly replicable groups of patients for special 
treatment or study. Objective tests with standard conditions are a 
necessary prelude to the development of better diagnostic and thera- 
peutic science, Projective devices and expert diagnosticians are not 
a substitute for this, 


A Case IN POINT 


Among numerous illustrations which could be selected, the Pd 
score of the MMPI provides a good example. The Pd score was 
Originally aimed at producing agreement with clinical staff di- 
agnoses. The criterion groups were made up of various patients 
diagnosed psychopathic personality. No one has ever thought that 
this diagnosis significd a constant disease or even a stable pattern 
of symptoms. But even before the Pd score was derived, it had 
been suggested, on clinical and psychometric data (4), that there 
was a smaller group of persons among the psychopaths who were 
much more similar in symptoms, behavior, and other items. These 
have gradually emerged as a useful type and today the psycho- 
pathic deviate or asocial sociopath is one of the clearest persono- 
logical constructs. The long suspected defect in emotional func- 
tion within this sub-group appears to have been substantiated by 
Lykken’s (6) finding of deficient autonomic (“anxiety”?) condition- 
ability. If such new indicators serve to link familiar and useful 
clinical data to more discriminative psychometric scales, test effi- 
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ciency will increase although the agreement with the older diag- 
nosis may even decrease. 

We would, of course, prefer that decisions from tests should rest 
upon determined validities and established improvement over base 
rate predictions. The average clinical situation is far from this ideal 
at present, and if one insisted upon practicing according to the ideal, 
one would find it hard to use personality tests. 

Many clinical psychologists have a tendency to quit using psy- 
chometric data, possibly because they become uncomfortable with 
the suspicion or evidence that their test data are not improving 


accuracy of decision. It may also be that they are impressed with 
the fact that a majority of the psychiatrists, and probably most of 
the clinical psychologists, in Private practice, do not use tests. If 
laziness is the real reason a psychologist does not use tests, it is hard 
to prove. When the clinical psychologist does not use tests, he has 
ceased to identify himself with the scientific future of his profession 
However, if diagnostic or other contributions of a testing procedure 


do not sufficiently allay the insecurity of the clinician, do not pro- 
vide him with use 


se ther omfort in the fact that the field of psychological 
medicine is full of Even admitting the ne om 
other clinical procedures. If we 
usly pessimistic, we need only to 
tude, interest, and ability evalua- 


ements show the basis for expectation that we 


efficiency, 


References 


10, 


» Hathaway, S. R. The personality in 


. Hathaway, S. R. and Monachesi, 


- Meehl, P. E. Wanted—a good coo 
. Mechl, P. E. and Rosen, A. Antecedent probability an 


. Mehlman, B. The reliability of psychiatric 


. Cronbach, L. J. and Mechl, P. E. Construct validity in psychological 


tests. Psychol. Bull., 1955, 52, 281-302. 


. Drake, L. E. Interpretation of MMPI profiles in counseling male 


clients. J. counsel. Psychol., 1956, 3, 83-88. 


. Hathaway, S. R. A coding system for MMPI profile classification. 


J. consult. Psychol., 1947, 11, 834-337. 
ventory as an aid in the diagnosis 


of psychopathic inferiors. J. consult. Psychol., 1939, 3, 112-117. 
t E. D. The personalities of pre- 
delinquent boys. J. Crim. Law, Criminol., and Police Science, 1957, 


48, 149-163. 


. Lykken, D. T. The study of anxiety in sociopathic personality. J. 


abnorm. soc. Psychol., 1957, 55, 6-10. 
kbook. Amer. Psychol., 1956, 11, 
263-272. 

d the efficiency 


of psychometric signs, patterns, or cutting scores. Psychol. Bull., 


1953, 52, 194-216. 
diagnoses. J. abnorm. soc. 


Psychol., 1952, 47, 577-578. 
Rosen, A. Detection of suicidal patients: an example of some limita- 
tions in the prediction of infrequent events. J. consult. Psychol., 
1954, 18, 397-403. 


203 


XI 


Future Impact of Psychological 
Theory on Personality 


Assessments ` 


James G. MiLier 
University of Michigan 


Mea Topps moving picture 
“Around the World in 80 Days” included in addition to the primary 
stars a number of other outstanding actors, stars in their own right, 
who consented to play brief vignettes called “cameo parts.” And 
in such a sense we can all say, not “I Am a Camera,” but “I Am 


, behaving systems surrounded by a skin 
e-time in an engaging variety of patterns. 
which separates us from the environment. 
is the subject-matter of a number of en- 
N | » most of which are physical or biological. In 
making their observations these disciplines employ the dimensions 
gram-second system and its derivatives, like tem- 
perature. In addition they are beginning to employ in some situa- 
tions a new sort of dimension, the units of information theory. 
onmental sciences derive a unity from this joint 
use of dimensions and the classical Conceptual system which under- 
lies them. Upon crossing the boundary of the skin, however, things 


684, Dr. James G. Miller Principal In- 
are not necessarily those of the 
y- 
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change. Occasionally the units of the environmental sciences are 
used, but much more characteristically the dimensions measured 
in behavioral and personality assessment are entirely different and 
quite unrelated. 

The psychology of personality usually measures traits with names 
now used technically after having originally been parts of common 
speech. This fact was dramatized a few years ago when Allport 
and Odbert (1) went through an ordinary dictionary and compiled 
a list of words used to describe personality or behavior, hoping 
thereby to get a full sampling of all possible characteristics. 

The way someone trained in physics or engineering would tech- 
nically describe the action of an automobile and the way a person- 
ality psychologist would describe the behavior of a person are 
strikingly different. The engineer would say that a car has a weight 
of 3400 pounds and that its previous acceleration at full throttle 
was 24 ft. per second. Now, however, it has a small pebble of 8/16 
inch average diameter in the gasoline feedline of 1/4 inch diameter. 
As a consequence the rate of gasoline flow at maximum throttle has 
been diminished from 8 gallons to 1 quart per hour, so the maximal 
acceleration is now only 6 ft. per second. If the car were a human 
being, a student of personality would say that, although it had a 
somatotonic body type, it had formerly rated four on a scale of 
alertness. This trait was now diminished to a rating of one, and 
sluggishness had increased from two to five. 

This striking difference arises partly from the divergences be- 
tween the ways we measure the action of human beings and the 


acts of nonliving systems. A person customarily serves as an observer 
thout intervening objective 


or rater of the acts of other persons, wi 
instruments to quantify the observations. Environmental sciences 
characteristically use precisely calibrated instruments. To a large 


degree therefore they eliminate the human error involved in rating. 

Some observational activities—for example, pattern recognition 
—must be done by human beings because as yet we do not have 
adequate instruments for them. Pattern recognition—of faces, be- 
havior sequences, similar designs—is essential in all science. Content 
analysis of verbal or written communications is another field in 
which the human rater must be employed. Continuing efforts 
should be made to discover ways to replace uncalibrated human 
beings with more precise instruments in such fields as pattern recog- 
nition and content analysis. And as long as human beings are used, 
the linguistic and other difficulties involved in rating methods 
should be recognized and everything possible must be done to 
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hich occurs among 
content analysis. 

avior, it might be quantified 
has been. Dominant sym- 
me apparatus like the inter- 
number of times in an hour 
and the number of seconds of 
at of others with whom he is 


d probably be quantified by 


a sort (like money) the rewards 
a person requires to carry out certain standard acts. The more 


ted during a standard period of 
s. 


Schizoid. The percentage of characteristically abnormal meta- 
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phors or grammatical constructions in a standard sample of talk 
can be measured. So can the number of times that hallucinations 
are reported or that the patient appears to be hallucinating. 

Moderate word fluency. The number of bits per minute that the 
individual can read aloud or silently with correct responses con- 
cerning comprehension can be measured. 

High average performance 1.Q. Commonly we measure this by 
total score from various items or subtests in a test like the Wechsler- 
Bellevue. The relationships of these subtests one to another are 
not well understood. Instead we can administer tests involving 
a known amount of complexity measurable in bits, determining the 
rates at which a person can solve problems of different known de- 
grees of complexity. Such a method can be applied with equal 
objectivity to human beings and to lower animals. RL: 

There is more or less general agreement among psychiatric cli- 
nicians concerning the standard form of diagnostic evaluation; cer- 
tain published procedures for determining mental status are quite 
widely followed. But most of the items of such evaluations are 
qualitative rather than quantitative, or only quantitative in the 
roughest sense. For example, the number of digits which can be 
repeated forward and backward, or a series of informational ae 
tions of graded difficulty is given as basis for a rough estimate 0 
effective LQ. Various efforts have been made by Wittenborn, 
et al (10), Malamud and Sands (4), and others, to develop 5S 
quantitative scales of psychiatric status—instruments which ni e 
used not only when the patient is first examined, but repea 5 y» 
to indicate day-to-day changes in some quantitative fashion. Even 
though these scales are in many ways improvements over the usu : 
clinical method, they suffer from the psychometric shortcomings 0 
other rating scales. These shortcomings include: disagreement = 
rater to rater concerning the definition of the variable being rated; 
inter-rater difference in the amount of experience in rating the er 
able and in the sorts of patients remembered as seen previous 3 
(the reference population); lack of a known zero point and of di 
intervals between various steps in the scale; nonlinearity © e 
variable; and lack of orthogonality between variables. en 

Perhaps the factor analytic method for determining t e ma 
mental psychological dimensions is the best available att z mo aon 
Certainly this procedure has effectively simplified tha poo 
alities of the primary mental abilities, of personality, and of sem io 
meaning. The method, however, cannot rise above the shortcom: ngs 
of the testing instruments which provide the original scores, 
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cluding often the limitations of the human being as rater and evalu- 
ator. Moreover, the derived factorial dimensions do not have known 
relationships to the dimensions of the environmental sciences. ; 

I am not necessarily committed to the dimensions of natural sci- 
ence if someone can suggest others which are equally good or 
better for measuring the interactions of the individual and his en- 
vironment and consequently for advancing the unity of physical 
and behavioral science. Lacking such an alternative set of dimen- 
sions, we have proceeded to construct a series of tests in which the 
subject reacts to some electronic or other physical apparatus. We 
measure his performance, not along classical psychological dimen- 
sions, but on the sorts of dimensions that might be used by an elec- 
tronics engineer if the subject were a component in the electronic 
system. That is, we determine his personal “transfer function” in 
this system, using C.G.S. units, derivatives, and information units. 


Tue Drivinc BATTERY 


One set of such test situations is our driving battery (5). These 
tests have the advantage of face validity, being widely used for 
measuring driving skills in everyday life. On the other hand, they 
perhaps do not measure “pure” behavioral variables, certain aspects 
of the scores really being artifacts of the particular apparatus being 
used—for example braking time in a driver-trainer is not the same 
thing as a simple reaction time. 

The first piece of equipment in this driving battery is the Ameri- 
can Automobile Association’s “Auto Trainer.” This apparatus con- 
sists of two parts: the first includes all the controls of a conventional- 
shift automobile—starter button, speedometer, steering wheel, gear- 
shift lever, ignition key, and accelerator, brake, and clutch pedals; 
the second part is a treadmill-like belt about 10 feet long, which 
extends out from the front of the control unit. The belt, painted 
to resemble a tortuous roadway, revolves when the controls are 
in gear, the speed being controlled by the accelerator. In our experi- 
ment, however, the apparatus was modified so that the speed could 
be set by the experimenter’s controls at a constant fast rate (equiv- 
soe to approximately 20 mph) or a slow rate (approximately 10 
mph). 

A small model car, the steering mechanism of which is controlled 
by the steering wheel of the control unit, rests on the belt, its wheels 
turning as the belt revolves, and the speed of the belt determines 
its apparent speed. The task of the subject is to steer the car so 
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that it remains in the center of the roadway painted on the belt. 
A red and a green light are situated at the far end of the belt unit. 
When the green light is on, the driver is to proceed; when the red 
light appears, he is to stop the car as rapidly as possible by de- 
pressing the brake. 

An accuracy counter, a reaction timer, a trial timer, and speed 
controls face the experimenter at the side of the control unit, out 
of sight of the subject. A foot switch with which the experimenter 
can turn on the red light is also connected to the side of the unit. 
Large staples are embedded in the “roadway” every 8 inches. If 
the car is kept in the center of the roadway, it makes contact with 
the staples, completing an electrical circuit and advancing the ac- 
curacy counter 1 unit. The reaction timer measures in hundredths 
of a second the time elapsed between the appearance of the red 
light and the brake-pressing response. 

T he subjects are ae cells as follows: 20 revolutions of the belt 


at a fixed slow speed; 20 at a fixed fast speed; and 20 at a speed 


controlled by the subject. Six reaction-time determinations were 


interspersed irregularly through each of the 8 trials. 

On ihe ee fet, nor = obtained for accuracy at the fixed 
low speed, at the fixed high speed, and at the variable speed or 
trolled by the subject. The unit of measurement 1s the er 
of staples over which the car passes. Since the staples are embe h 
in the center of the roadway, the subject has to keep the car in the 
middle of the road to activate the accuracy counter. A time is 
is obtained, indicating the time required for each trial = a 
subject is controlling his own speed. During this phase of the tes 
the subject is asked to drive as rapidly and accurately as he can. 
A derived score is also figured—the ratio of the difference hemes 
the accuracy score at low fixed speed and the accuracy score F 
subject-controlled speed, divided by the time score. This pao 
accuracy ratio, which indicates the degree to un spee he 
rificed for accuracy, or vice versa, may be interpreted as a A er 
of judgment. Reaction times for the brake-pressing A 
taken while the car is being driven at low fixed speed, at fas 


speed, and at variable speed. A j 
"The steadiness test oe adaptation of the Whipple meri 
Test. The test panel contains a series of holes decreasing ™ i 

from 7/16 in. to 3/16 in. The subject is asked to oe rid 
metal stylus 1/8 in. in diameter into each of the holes ar a 
it there for 15 seconds without letting it touch the sesa = : > 
The apparatus is wired so that a timer is activated whenev 
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stylus touches the sides of the hole during the 15-second test period. 
Scores are obtained for three trials on each of the five holes, repre- 
senting the total amount of time that the stylus touched the rim of 
the hole. 

For the visual tests we employed the master model Ortho-rater 
constructed by the Bausch and Lomb Optical Company. This de- 
vice is designed to present slides for testing various visual functions, 
with distance and illumination controlled, It consists of 2 octagonal 
slide-holding drums set inside a boxlike apparatus. A binocular 
eyepiece is located at one end of the box. One of the drums is much 
closer to the eyepiece than the other an 
vision; the farther drum is used for testi 
slides are fastened to the drum and are 
the drum with an external handle. 

Standard Ortho-rater testi 
tests. Acuity is determined for both far 


d is used for testing near 
ng distant vision. The test 
easily changed by rotating 


havioral functions, but perhaps more frequen 


In the literature of human experimental psychology are studies 
with various apparatuses—often electronic 


The Tanner Auditory Perceptual Apparatus. The observer is seated 
in an individual booth and has in front of him four lights. The first 
light is a warning flash. The second indicates the signal interval, and 
may flash one or two times, depending on the experiment. The 
third light indicates the answer interval during which the observer 
indicates what signal he heard, or where he heard it. The fourth 
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ght lla cacy mani light, during which the observer is told 
ee another series of lights. The observer wears 

phon hrough them hears a constant background of noise 
which is set at an amplitude of .7 volts. This is turned off only 
during practice runs. When the actual run has started, the techni- 
cian lets the observer hear the signal without noise first, then noise 
is added, and a short practice is allowed. Then 100 presentations 
are given by the electronic equipment. IBM cards are automatically 
Benched recording the observer's answers on each of the presenta- 

ns. 
The performance is scored in terms of the decision-making theory 
of signal detection developed by Tanner and Swets (7). It is stated 
that under given experimental conditions there is one distribution 
of noise and another distribution of tone plus noise, and these 
distribution curves overlap somewhat. The quantity d’ is defined as 
the differences between the means of the noise and the tone-plus- 
noise distribution, divided by the standard deviation of the noise 
distribution. This d’ is measured as a function of the tone-to-noise 
ratio. Thresholds are obtained when the observer has to rely on his 
Own memory of a previous tone amplitude and circumstances when 
the previous amplitude with which comparison is made, is presented 
by the apparatus. In another set of experiments judgments are 
made as to whether a second tone is lower or higher than the first 
when the observer has to remember the previous tone and when 
the previous tone is repeated for him by the apparatus. The thresh- 


old for Gaussian noise is also determined. Such measurements have 
under several psychopharmacological 


been made by us with subjects 
ring effects of the drugs on some of 


drug stresses, and we are study 

the perceptual parameters. 

: Kristofferson Visual Apparatus. This apparatus (3) employs a four- 

interval, temporal forced-choice psychophysical method. The sub- 

ject has a sequence of trials made up of four successive time in- 
In every trial a visual 


tervals separated by clearly audible sounds. 


target of circular luminescence subtending one degree at the eye 
ackground of moderate lumi- 


is superimposed on a large uniform b l ; 
nescence at one of four time intervals. Which interval is so activated 
is randomly determined. This target is presented at the fixation 
point in a location known exactly by the subject. The exposure 
duration is 0.010 seconds. Fifty successive trials define a unit and 
require twelve minutes to complete. After one minute s Test the 
trials can be repeated. Consequently it is possible to derive tem- 
poral activity curves as well as dosage curves in this procedure. In 
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work so far carried out we have a definite indication of drug stress 
effects quantifiable in this situation. 

The PSI Apparatus. Members of our group have devoted more 
effort to the development of this technique and the use of it in a 
number of different situations, including those involving drugs, 
than we have to any other single test. The PSI apparatus (a contrac- 
tion of “Problem Solving Using Information”) permits the combina- 
tion of a number of elements into various logical relationships. The 
elements are represented by invariant electronic connections which 
in the latest version are transistors built into the apparatus. The 
logical relationships between these elements can be varied by means 
of a plugboard in the rear of the device, A set of logical relation- 
ships between the elements constitutes a problem. The connections 
determining a particular problem are wired on to a plug which is 
inserted into the plugboard to constitute each problem. 

On the panel is a circular array of lights with corresponding push- 
buttons. Each light represents an element and, depending on 
whether it is on or off, the state of the element. In the center of 
the panel is a light with no pushbutton, which represents the output 
of the circular array. A disc that can be placed in the center of the 
array has arrows drawn on it which show the relationships between 
the elements. For each problem there is a different disc, All rela- 
tionships are represented by arrows, and each depicted arrow stands 
for an existent relationship. The relationships possible are con- 
junction, disjunction, and negation (hence also implication). The 
direction of relationships is indicated by the head of the arrows 
which indicate only the existence and the direction of the relation- 
ship, not the kind of relationship. For example, the arrow between 
lights 5 and 2 might mean a) that if 5 is lit, then 2 will light, or b) 
that to light 2, it is necessary but not sufficient to light 5, or c) that 
5 prevents the lighting of 2. If there is no arrow between two 


lights, there is no relationship between them; in other words, the 
null relationship is not presented, 


In all problems the task is the sam 
ing the output of the network of el 


propriate pushbuttons. In order to le 
of the network—to solve the problem—the subject must analyze the 
relationships that exist between the 


When an element is activated, the 
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on the display panel goes on. Three seconds later, that light goes 
out and the lights for which the preceding light represents a suffi- 
cient condition for activation go on, remain on for three seconds, 
and in turn go out, activating those elements for which they are 
sufficient. In this way, an ordered set of consequences of the activa- 
tion of any combination of elements is presented to the subject. 
The subject is free to add further activated elements while the 
nt ome of previous button pushes are continuing. He may 
thus observe sequences of events resulting from the activation of 
elements and the interaction of multiple sequences. By trying out 
what happens when various lights or combinations of lights are on, 
the subject can uniquely determine the nature of all relationships 
in the network, and thus find out how to solve the problem. Sub- 
jects become familiar with the apparatus and situation by means 
of comprehensive illustrative problems presented in detail by the 
experimenter, and solve these sample problems before measurement 
begins. Average time for familiarization and testing on the two 
problems used most frequently is about two hours altogether. 

_ The raw material for our analysis of the problem-solving process 
is the sequence of experiments that are performed—that is, the 
sequences of buttons pushed and the time at which they are pushed. 
These data are recorded automatically on tape which can be ex- 
amined and analyzed along many mathematically derived dimen- 


sions. 

The information content of the n 
can be determined in bits. With the possible exception of Ravens 
matrices (8) this is the only currently available test of mental 
abilities made up of problems that increase in complexity in com- 
parable units (bits). It differs from the classical intelligence tests 
like the Stanford-Binet and Wechsler-Bellevue in which the tasks 
for various age levels or subtests are entirely different in character, 
and consequently do not have known commensurability of dimen- 
sions. The problems of the PSI apparatus increase in complexity 
from a one-bit problem (push a certain button and the central light 
goes on)—the “earthworm floor”—to a very complicated sequence 0 
button-pushes—the “Einstein ceiling?”—by simply adding more bits 
of comparable character, representing one of the three logical re- 


lationships. r 
By analyzing the nature of the actions carried out during the 
problem-solving process, taking into account their order, it is pos- 
sible to relate various aspects of the process to the information 
gained by the subject up to that point, and also to relate the product 


etwork representing any problem 
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of the process, the solution, to the information state. The raw data 
permit the quantification of a large number of variables, some of 
which seem to be of reasonable statistical independence. Some of 
these variables are power variables such as the time required for 
solution, or the number of experiments required for solution. Others 
are more validly process variables, and have consequently been 
of greater interest to us. 

This procedure puts a sort of microsco 


pe on cognitive processes, 
including memory, learning, reasoning, and information processing. 
We are car 


rying out factor analytic studies to learn the dimension- 
ality of the domain of 40 or more variables, which are derivative of 
CGS and information units exclusively. The factor analytic studies 
are being done both on individuals and on groups. We hope thereby 
to find ways in which individuals and groups, or normal individuals 
and individuals under stress, differ in their cognitive processes. 
We are studying the effects of drugs on such performance, both 
in diminishing cognitive functions—behavioral 
improving it. Horvath, Uhr, Kelly, 
various aspects of this work. 
Stroud Apparatus. Foster, of our group, 
nary studies with the Stroud apparatus. T} 
possible to present auditor 
tion can be varied by the experimenter, T 


toxicity—and also 
and Rapaport are involved in 


equipm: ; 5 t 
plastic knobs 234 inches in dig re eight translucen 


meter mou 
of black plywood, Each knob contain 
arranged in a semicircle aro: 


toward or away from the 

a re fix is re 
the radius of the semicircle į een at a distance whe 
subject is seated so that t 
center. Timers control t 


successive lights, and the duration of 
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When a light goes on, the subject is instructed to put it out by 
hitting the knob. The recording is automatic. Using this equipment 
Kornblum has been able to find definite drug effects. 

Tracking Apparatus. This is a standard task which has been 
studied extensively for itself alone, but has scarcely been used in 
evaluating stress effects, although it appears to us to have great 
promise in this field. The subject’s task is to keep a blip on an 
oscilloscope, the motion of which is controlled by an electronic 
problem generator, directly on cross-hair lines. He does this by 
moving a joy-stick. The blip can move at different rates and with 
differing regularity or randomness. The subject's error in the track- 
ing task can be analyzed into factors due to a) lag in response, b) 
misperception of spatial distance, c) misperception of speed of mo- 
tion, and d) misperception of acceleration of motion. Each of these 
error components can be automatically recorded in digital fashion. 
It is quite possible that some drugs or other stresses will affect 
certain of these performance factors, while others will affect other 
factors, or other functional subsystems. 


These are a few of the types of appara en 
experimenting. It is important, of course, to develop norms tor per- 


formance on them and to learn all we can about the variables they 
measure. Many may correlate highly with variables in pencil me 
paper tests or on rating scales. But we believe that, as there are ad- 
vances in the use of such instruments, they may provide percision 
of personality measurement beyond that which at the moment 


exists, 


tus with which we are now 
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Summary and Conclusions 


Harorn B. PEPINSKY 
The Ohio State University 


a — n B 


À FRIEND RECENTLY COMMENTED 


that psychologists do not listen to other people, because psychol- 


ogists prefer to tell about what they are doing. My friend, who is 
himself not one of the faithful, had just returned from a conference 
attended mainly by psychologists, and this was the impression they 
had given him. If one examines the reference lists appended to 
the foregoing papers, however, one must infer that this group of 
psychologists, at least, is well aware of past and current literature 
on the assessment of human personality. Yet with the exception of 
Robert Watson, our historian, the symposium participants have run 
true to my friend’s impression. In a very human way, each has made 
reference to the work of others as a means of justifying what he is 
up to. A brief review of what has been said and some concluding 


remarks are now in order. 


A Review 
Robert Watson has traced for us the growing concern of Ameri- 
can psychologists for objectivity in the measurement and control 
of (a) stimulus conditions, (b) the observer, and (c) the responses 
of the person being observed. At the same time, Watson argues 
that American psychologists have increased both in their awareness 
of the complexity of measuring human personality, and in their 
willingness to tackle complicated problems of measurement. He 
brings his chapter up to date by reporting on definitions of ‘objec- 

217 
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tive personality assessment,” with which he has been furnished by 
osium participants. = 
en projekton, ad self description approaches are a 
cally reviewed by Donald Super, who comments also, upon the 
waxing and waning of fads in assessment theory and methods. He 
concludes that projection (projective) approaches have little prac- 
tical utility at either the public or private levels of measurement. 
While Super does not reject the use of performance measures, he 
cites the higher predictive validity of self description measures, 
particularly of the biographical (i.e., autobiographical) inventory. 
The next participant, Raymond Cattell, does not like scaling 
methods of assessing personality, preferring instead factor analytic 
method applied to what he calls “multivariate experiment” in the 
natural setting. Cattell is armed with an impressive array of rating, 
questionnaire, and objective test data, which have been digested in 
large amounts by electronic computers, He argues that multivariate 
factor analysis is analogous to the clinical method of personality 
assessment; factor analysis enables the psychologist to remain close 
to his data as he interprets them and yields information on both 
known and unknown clinical constructs. ; 
Louis McQuitty, however, thinks that his method of “successive 


agreement analysis,” used to isolate and differentiate personality 
types, is more flexible than tradition 


persons sampled, wh 


ential validity for different groupings of persons, McQuitty’s 
method appears to be a kind of Q technique, in which persons are 
intercorrelated across a set of items and then, by iterative procedure, 

sively larger clusters of like r 
whether item particular content is 
groups of persons 
own and others’ research, 


people in general), This assump- 
hypothesis,” which states that a 
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this empirical finding i 
cial ee ie y a pa pes ` ivan a 
pr a So ape hd s u ies o social desirability exemplify 
a est ie “construct method” of personality test con- 
nn 1 he clearly prefers to factor analytic or “criteri 
E a eh bem idea here is that individuals and groups differ 
ao in their tendencies to select “True,” “False ” or “Undecided” 
Y . . s ? n 
ne: = _ but in their tendencies to give socially nn. 
nie i ems as well (regardless of whether the socially desir- 
ee > u Br ; False i Thus it is considered par- 
to the varine of ern a e i an ae st ere 
general can be ex anced "4 mt 1 ck cal > an a 
ee k i: rien cen c inically designated item 
= ae ver ss Be erizing “ epression or schizophrenia’ 
ilte e r a m esirable, and this expectation is supported by 
“high” aa of Ec wards and his associates. Also, persons who score 
bg da gar socially undesirable item clusters can be expected 
ea s ow social desirability scores, independently measured, 
er tests, which such persons tend to do. 
st keon ae only sounds like an electronics engineer in his 
Ppa = = an t us that he wants to sound like an elec- 
BE u a ais approach to the problem of personality 
a : ost o the physical and biological sciences, he points 
nn ommon units of observation and measurement (the cen- 
er-gram-second system) and, more recently, of enumeration 


the bi ; 5 5 
(the bit of information) and a classical conceptual system. In stark 
ement, and conceptualiza- 


oe psychologie language, measur 
methods a His compelling argument is that the concepts and 
of pliysica ; eae systems theory can be used to advance the unity 
eeuen an behavioral science. First, he suggests how traits 
mate a y used in the description of human personality can be 

e precisely redefined and measured if the human organism is 


1 an electronic system; second, he describes 
elope 


Mieleen d at the University of 
ichigan’s Mental Health Research Institute; and tl 


h hird, he shows 
ow such tests can be used to provi f human 
ert gel As the title of his paper implies, a new and commu- 
cable and valid personality theory is in the making; despite 

present claims of other persons, it is not now available. 
yosium, Wayne 


H ey among, the contributors to this symp 
oltzman has taken as his assessment model a standard projection 


devic 5 £ 
a the Rorschach, and modified it into a test that utilizes more 
jective stimulus conditions and scoring procedures. In this task, 
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he has sought to combine the qualitative richness of this projective 
method with the quantitative rigor of psychometric analysis. The 
result—the Holtzman Inkblot Test—is described following a brief, 
yet comprehensive review of methodological problems that arise in 
dealing with projective materials. 

Bernard Bass has provided an exceptionally well organized and 
scholarly background for his discussion of the Louisiana State Uni- 
versity leadership studies, including an extensive review of pub- 
lished attempts to assess the leadership variable by objective 
methods. The maturity of his paper, in an area that has been 
characterized by procedural sloppiness and sweeping generalization, 
is a testimonial to the value of a research worker’s dedication to a 
restricted methodology which, in this c 
changes of attitudes or 


yielded an enormous supply of empirical data, which he has been 
able to organize into theoretica 


ed a number of hypotheses that can 
utside of the laboratory. 
ul that his services are about to be 


5 Psychophysics, 

. Finally, Starke Hathaway urges the development of clinical test- 
ing procedures that provide a maximum of utility yet require a 
ramimum of cost and effort. Reduction of the “professional time 


Cede ment required for test giving and scoring (can a high grade 


t givi 
clerk or a technician do it?) rates the latter objective; increas- 


illust 
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ing the number of “hits” in predicting from test to criterion data 
illustrates the former. Hathaway warns, however, against attempts 
to predict diagnoses or clinical estimations to the exclusion of what 
he considers a more important kind of validation, i.e., that obtained 
in the process of construct development. The construction of his 
own Psychopathic Deviate (Pd) scale, one of the measures yielded 
by his Minnesota Multiphasic Personality Inventory, is cited as a case 
in point. Development of this construct, for example, gave rise to 
the belief that diagnosed Pd’s would be relatively deficient in their 
emotional functioning. A subsequent experiment demonstrated that 
high Pd’s indeed were less susceptible than low Pd’s to autonomic 
conditioning. 


Some CONCLUDING REMARKS 


The symposium participants, as a group, are dissimilar in their 
conceptions of how personality is to be assessed. Thus, for example, 
the advocated raw data of assessment range from expressed clinical 
judgments of patient behavior in the natural setting to subject re- 
sponses in a controlled laboratory situation that can be fed directly 
into an electronic computer. On one point, at least, there is over- 
whelming agreement: whatever assessment data are used must, in 
principle, be capable of being defined empirically, measured in 
amount, and recorded as scores in the public domain. It is this kind 
of belief in “objective” measurement, SO obviously a shared con- 
viction of the contributors, that characterizes the present sym- 


posium. 


On the issue of objectivity versus e 
of personality, then, there is no quarrel among the participants. But 


there are other issues on which the participants appear to be in 
hearty disagreement, or where additional problems of — 
can be expected. This may seem s en one stops to consi er 


that a response made by a subject d stimulus condition is, 


simply, the product of an interaction between subject and stimulus 
ions, at least, are sug- 


conditi a dological quest 
ondition. Yet several methodologica" q A) What bje? ©) 


gested by this apparently simple statement: 
What stimulus condition? (3) What response? (4) What antecedent 


events, in respect either to subject or stimulus condition, ari 
be considered in predicting what a given subject s response Wi i e 
An additional methodological question has been barely oe he 
upon by the participants but is nonetheless relevant: (5) In w = 
view is the stimulus condition to be defined and in whose hee 
the measured amount of the subject's response to be assessed! 


subjectivity in the assessment 
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Ohio State University research productivity in organizational Ja 
tings has suggested the importance of maintaining conceptual = 
tinctions here among: (a) the subject, (b) the task-setter who 
prescribes the stimulus condition, and (c) the observer who reports 
on and analyzes the subject's response and who may or may not be 
identical with the subject or the task-setter. Thus, Hunt’s clinical 
judges must be viewed at the same time both as observers of patients 
whose responses are to be predicted, and as themselves subjects 
whose clinical predictions are to be treated as responses. A related 
methodological question also must be asked: (6) What is the sub- 


ject’s phenomenal view (as a measured response) of the task-setter 
and the stimulus condition that | 


important to know whether th 
Bass’s laboratory or in a Hathaw 
as playing a game as opposed to playing “for keeps,” whether the 
task-setter is regarded as hostile, fri 
the task condition has face validity for th 
that is important for him to perform. 


f which we can turn now. 
ance, differ on the issue of item inter- 
at particular item content can safely be 
i Sgesting that subject-item interactions can 
bo interpreted to yield useful clinical constructs. Again, Cattell 
Super, in respect to the manifest versus the 
: item. Super wants the item to have the same 
meaning for the subject as i 


1 This last point is well covered by 
chologist as perceiver, In R. Tagiuri 
interpersonal relations, Stanford: Stan: 


Joan Criswell, 
and L. Petrullo 
ford Univ, Press, 


(See Criswell, Joan. The psy- 
(Eds.), Person perception and 
1958, pp. 95-109. 
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parsimoniously in terms of Edwards’ social desirability component. 
Hathaway, a practicing and practical clinician, argues for what 
works and has utility in the hospital setting, by which criteria the 
presence or absence of a social desirability loading may be irrelevant 
to the interpretation of an item. 

How item-responses should be analyzed is a third issue, with 
Cattell preferring factor analysis, McQuitty supporting successive 
agreement analysis, and Edwards expounding scaling. A fourth 
issue is that of the laboratory versus a natural setting as the locus 
of data collection, although even Cattell, the most vigorous exponent 
of “observation in situ,” imposes behavioral restrictions on his sub- 
jects in the process of obtaining information about them. Hunts 
judges, of course, are quite free to observe patients’ natural move- 
ments in the hospital setting; even here, though, it is assumed that 


the reliability of predictions among judges is increased by control 
of the stimuli to which they are exposed. Bass, in contrast, is un- 
ing lawful relationships 


ashamedly at work in a laboratory, seek l 
among variables systematically controlled and manipulated. 4 
Strangely enough, none of the authors has stopped to define the 
term, “personality.” Therefore, we cannot know whether this is a 
definitional problem on which they are at issue. Their chapters 
indicate, however, that the participants are otherwise impressed 


with the need for definitional clarity and empirical referents ve 
their variables, and for obtaining reliable scores based on care 
iables. What emerges from 


observation and measurement of the vari \ dable 
this preoccupation is a kind of stubborn, but highly commen 


simplicity in the authors’ discussions of their work, We may ns 
clude that the apparent simplicity results precisely an 

empirical study on which the chapters are based = ena y 
painstaking and extensive in every case; each author so 0 - y 
knows what he is writing about. On the one hand, the ppoe give 
encouragement to the belief that we are closer than na r me 
being the proud possessors of psychological vent n the ¢ en 
hand, the implications of the several chapters are so 0 ten at varia 


aim O: y p pants to the 
a each other that the clai f an of the artici 


possession of knowledge that orders psyc. i 

lawful association with each other is somewhat vekne RR 
Nearly all of the authors give clear indication of me ay aeS 

that human life is very complicated and of their ae ay a et 

ing the task of helping us to order and measure the da a | 

existence. Each takes himself and his work quite a nn 

evidences a belief in the validity and urgency of his task. Curious'y, 
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what emerges as a dominant theme of the symposium is a pre- 
occupation with method, instrumentation, and the interpretation 
of data. For example, the heuristic yield of new methods for the 
rapid collection and processing of data is taken for granted by 
several of the authors, who promote the use of electronic gadgetry 
with great enthusiasm. An implicit and seductive argument that 
underlies this zeal for a “psychonomy of abundance” is that if we 
can get enough data, we are bound to have good data. 

Curiously, also—in view of a Zeitgeist for interdisciplinary re- 
search (e.g., among research fund-granting agencies), Miller is the 
only member of the group who openly and strongly advocates an 
interdisciplinary research effort. None of the participants has paid 
more than passing lip service to the possible contributions that 
sociologists, anthropologists, or other kinds of social scientists might 
make to problems of assessing human personality, Is this because 
other social scientists have nothing to contribute to the symposium 
topic? One regrets that this question did not become an issue for 
the symposium. 

Despite a few understandable sins of commission and omission, 
there is much to read in the present series of papers that is rich 
and satisfying. A new and abundant harvest of ideas and facts and 
methods of inquiry has been made available for research workers 
and practitioners alike. The papers give heartening testimony to 
the fact that one can stay very close to one’s data yet, in making 
sense out of them, manifest considerable imaginativeness and pro- 
ductive originality, Certainly, by comparison with the OSS and VA 
assessment research of the previous decade, there is ample evidence 
furnished us here of significant accomplishment in the development 
of a methodology for the objective assessment of human personality. 
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